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Preface 


The  Taxis  project  aims  to  provide  tools  to  facilitate  the  design  of  large  interactive  information 
systems.  It  uses  a  framework  of  abstraction  principles  (including  classification, 
generalisation/specialisation,  and  aggregation),  based  on  work  on  knowledge  representation  in  the 
field  of  Artificial  Intelligence,  and  also  draws  on  results  from  Programming  Languages,  Databases, 
and  Software  Engineering.  This  report  collects  16  papers  from  the  project,  from  late  1982  to  early 
1984.  About  half  of  the  papers  have  been  presented  by  project  members  at  conferences;  some  appear 
in  this  report  in  revised  form. 

The  papers  are  grouped  into  seven  sections: 

1.  Overview  (2  papers) 

2.  Conceptual  Modelling  Languages  (2  papers) 

3.  Medical  Applications  (1  paper) 

4.  Implementation  and  Design  Issues  (2  papers) 

3.  Requirements  Modelling  (4  papers) 

6.  Linguistic  User  Interfaces  (3  papers) 

7.  Exceptions  and  Errors  (2  papers) 

The  first  section  includes  an  introduction  to  the  Taxis  project;  it  also  includes  a  discussion  of 
the  wider  area  of  artificial  intelligence  research  on  databases. 

The  next  section  includes  comparisons  of  Taxis  with  other  conceptual  modelling  languages, 
including  Galileo  and  Adaplex. 

The  third  section  describes  the  application  of  Taxis  to  the  design  of  some  medical  information 

systems. 

The  modelling  of  long-term  processes  is  provided  by  Taxis  scripts,  which  are  based  on 
augmented  Petri  nets.  The  fourth  section  describes  an  efficient  implementation  of  scripts;  it  also 
describes  an  integrated,  interactive  design  environment  for  the  language. 

The  Taxis  framework  has  also  been  applied  to  the  initial  phase  of  designing  information 
systems.  The  fifth  section  presents  a  requirements  modelling  language,  which  captures  real-world 
knowledge,  and  has  a  formal  basis. 

The  next  section  describes  a  linguistic  component  for  Taxis,  intended  to  simplify  the  design  of 
user  interfaces  in  information  systems. 

The  final  section  introduces  a  flexbile  framework  for  handling  exceptions  and  errors  in 
information  systems. 

A  bibliography  for  the  project  is  also  provided. 


Requests  for  publications  may  be  addressed  to: 
Taxis  Project 

Department  of  Computer  Science 
University  of  Toronto 
Toronto,  Ontario,  Canada 
M5S  1A4 
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A  KNOWLEDGE  REPRESENTATION  APPROACH  TO  SOFTWARE  ENGINEERING:  THE  TAX'S  PROJECT 


Sol  J.  Greenspan  and  John  Mylopoulos 


Depar  tment  of  Computer  Science 
University  of  Toronto 
Toronto,  Ontario 


Abstract  This  paper  presents  an  overview  and  prospectus 
of  the  Taxis  Project  at  the  University  of  Toronto.  The 
project  addresses  several  problems  of  Software 
Engineering  within  a  general  unified  framework.  The 
framework  offers  a  coherent  set  of  basic  principles  for 
modeling  and  abstraction  —  drawn  from  work  in  Knowledge 
representation  in  the  field  of  Artificial  Intelligence(AI)  —  and 
applies  them  to  several  aspects  of  system  design  and 
description.  The  paper  describes  the  Taxis  conceptual 
framework  and  approach,  the  current  status  of  implemented 
tools,  and  a  number  of  practical  issues  that  have  received 
attention  by  the  project. 

7  INTRODUCTION 

7.7  Toward  a  Unified  Approach  to  Software  Engineering 

Software  engineering  involves  a  number  of  system 
descriptions,  or  specifications,  which  serve  different 
purposes.  The  types  of  specifications  can  be  thought  of  as 
addressing  several  specification  levels : 

System  roqui rements  specification  —  a  description 
of  the  overall  function  that  needs  to  be  performed  by 
some  system. 

System  design  specification  —  a  high-level 
description  of  the  system  architecture,  i.e.  the 
structure  of  the  system  in  terms  of  interconnected 
modules. 

Program  specification  --  the  algorithms  and  data 
structures,  described  in  a  manner  Independent  of 
what  programming  language(s)  will  be  used. 

Program  —  a  set  of  instructions  and  data,  encoded  In 
a  programming  language,  to  implement  the  program 
specification. 

From  requirements  through  programming,  these  levels  range 
from  more  world-oriented  specifications  to  more  machine- 
oriented  specifications.  Requirements  are  world-oriented  in 
that  they  deal  with  concepts  of  one  or  more  application 
domains  and  describe  the  purpose  to  be  served  by  a  system 
in  a  real-life  environment.  The  successive  categories  move 
increasingly  closer  to  concepts  of  computation, 
programming  languages,  and  computers. 

Proceedings  of  the  Conference  of  the 
Canadian  Information  Processing  Society 
Ottawa,  Ontario,  May  16-20,  1983, 
pp.  163-174. 


A  meihcdc'ogy  for  software  engineering  should 
provide  at  least  three  things  for  each  level  of  specification: 

languages  --  precise  notations  for  expressing  the 
relevant  information. 

techniques  —  procedures  for  constructing, 
manipulating,  and  validating  specifications. 

tools  --  automated  aids  designed  to  support  the 
above. 

Advances  in  program  design  and  programming  include  high- 
level  language  facilities,  techniques  such  as  structured 
programming,  stepwise  refinement,  and  program  analyzers, 
and  programming  support  facilities.  The  same  kinds  of 
technology  are  needed  for  requirements  and  design,  if  the 
overall  task  of  software  engineering  is  to  be  efficient  and 
predictable 

Technological  advances  in  programming  have 
resulted  from  the  ability  to  understand  and  formalize 
program  concepts  and  ttie  programming  process.  However, 
requirements  and  design  are  not  as  well  understood,  and 
due  to  thair  world-oriented  nature,  formalization  is  very 
difficult.  Better  languages,  techniques,  and  tools  need  to  bo 
developed. 

By  a  unified  approach  to  software  engineering,  we 
mean  one  that  combines  specifications  from  all  levels.  In 
this  regard,  it  is  useful  to  view  software  development  as  the 
construction  of  a  series  of  increasingly  more  complete 
models  of  a  system  at  different  stages  of  development.  All 
of  the  models  are  viewed  as  representing  the  same  system, 
but  the  first  takes  a  completely  world-oriented  view,  and 
successive  models  take  progressively  more  machine- 
oriented  viewpoints. 

At  various  levels,  we  need  to  model  at  least  the 
following  aspects  of  systems: 

Information  --  the  information  that  is  stored  in,  or 
can  be  extracted  from,  the  system. 

C  'uivior  —  the  pro:  c.ribod  ways  that  information  can 
be  changed  over  time. 

User  interface  —  the  interaction  between  the 
system  and  users. 
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Even  a  cursory  glance  at  the  state-of-the-art  shows 
that  a  number  of  descriptive  approaches  have  been 
proposed  and/or  used,  addressing  different  specification 
levels  and  different  system  aspects.  For  requirements  and 
design,  some  of  the  prominent  languages  and  tools  are 
PSL/PSA  [  reichroew77],  used  for  information  system 
description,  RSL  and  its  associated  tools  [Alford77],  which 
are  used  for  real-time  systems,  and  SADT  (a  trademark  of 
Softech,  Inc.)  [Ross77a,  Ross77b],  which  has  been  widely 
used,  mainly  for  requirements  definition.  At  lower  levels  of 
specification  are  some  system  design  and  program 
specification  techniques,  e.g.  [Jackson75]. 

In  response  to  the  critical  need  to  manage  large 
quantities  of  information,  many  organizations  have  turned  to 
description  techniques  such  as  the  Entity-Relationship 
approaches  based  on  [Chen76],  and,  In  the  area  of  Data 
Base  Management,  data  models  are  popular  devices  for 
descriptions  at  different  levels  of  database  design  and 
implementation  [Tsichritzis81  ].  There  has  also  been  a  lot  of 
work  in  the  area  of  software  specifications  (e.g.  [SRS79]) 
which,  in  contrast  to  the  data  models,  have  concentrated  on 
behavior  modeling  rather  than  on  modeling  of  the  information 
aspect. 

To  summarize,  current  approaches  address  particular 
aspects  at  particular  levels,  but  the  crucial  problem 
remains:  How  can  an  appropriate  set  of  modeling  facilities 
be  combined  into  a  unified  framework? 

1.2  The  Taxis  Project 

Within  the  Taxis  Project  at  the  University  of  Toronto, 
modeling  for  both  requirements  and  information  system 
design  is  addressed  within  the  same  descriptive  framework. 
The  Taxis  framework  offers  a  small  set  of  basic  principles 
which  are  applied  uniformly  to  all  aspects  of  modeling.  We 
bc'ieve  that  such  a  framework  is  essential  to  the 
development  of  an  effective  software  engineering 
methodology.  The  basic  concepts  are  discussed  in  Section 
2. 

The  focus  of  the  Taxis  Project  is  the  Taxis  language, 
which  was  developed  to  provide  n  framework  for  information 
system  design.  The  language  provides  a  notation  that 
supports  its  underlying  principles,  providing  a  natural  way  of 
encoding  knowledge  about  the  application  domain.  Taxis 
integrates,  within  a  framework  of  classes,  properties,  and 
inheritance,  the  notions  of  data  classes,  which  are  similar  to 
relations  [Codd70];  standard  Algol/Pascal  control 
structures;  and  procedural  exception-handling  as  in 
[Wasserman77].  Taxis  is  intermediate  between  a 
nonprocedural  conceptual  requirements  specification 
language  and  an  ordinary  application  language  plus 
di'tab-i  .<?  inanagcmcrt  system  as  used  today. 


The  Taxis  language  has  three  main  components: 

1.  The  semantic  component  of  Taxis  describes  a 
database,  including  transactions.  The  user  can 
specify  integrity  constraints,  exceptions,  and 
exception-handling  within  the  language. 

2.  The  pragmatic  component,  called  scripts,  addresses 
the  modeling  of  interaction  between  the  users  and 
the  system.  Scripts  provide  a  graphical  language 
based  on  Petri  nets,  combined  with  communication 
primitives,  for  modeling  the  flow  of  events  in  a  user's 
world  and  the  points  of  interaction  with  the  system. 

3.  The  linguistic  component  of  Taxis  offers  facilities 
for  designing  languages  (e.g.  query  languages)  for 
communicating  with  the  system. 

We  discussed  earlier  that  software  development 
should  start  with  a  completely  world-oriented  model.  For 
this,  the  Taxis  methodology  has  another  component: 

A  requirements  model  is  a  description  of  some 
portion  of  the  world  that  encompasses  potential 
information  systems  and  is  used  to  communicate  and 
analyze  the  problem  situation,  and  provides  a 
starting  point  for  information  system  design  using 
Taxis. 

A  model  at  this  level  corresponds  to  "Corporate 
Requirements  as  defined  in  [Lum79].  A  language,  called  the 
Requirements  Modeling  Language(RML),  has  been  designed 
for  this  purpose. 

We  will  discuss  the  Taxis  language  components  and 
RML  in  Section  3. 

2  FOUNDATIONS 

The  Taxis  Framework  has  two  main  ingredients. 

2.7  Object-Or iented  Framework 

First  of  all,  it  is  an  object-oriented  framework.  In 
this  respect  Taxis  is  similar  to  other  object-centered 
frameworks,  such  as  those  of  Simula  [Dahl72]  and  Smalltalk 
[GoldberqSI  ].  In  an  object-oriented  framework,  a  model 
consists  of  a  set  of  interrelated  objects.  Each  object  in  the 
model  is  intended  to  stand  for  some  entity  or  concept  in  the 
world  or  system  being  modeled.  The  creation,  modification, 
and  manipulation  of  objects  is  taken  to  represent  the 
behavior  of  their  counterparts  in  the  world.  A  big  advantage 
of  object-cantered  frameworks  is  the  direct 
correspondence  batween  the  model  and  the  world. 

T“(>  most  prominent  collection  of  object-oriented 
*r9m  ■'  •/•vks  -re  semantic  nof-vorks  [Oi!lliar68,  Findler79j, 
whi.. i  o.t;  pCt'Uiai  sciieffifeo  111  Aitificidl  intelligence  (Ai)  ."or 
leprestnuiiy  knuwiedge  to  ue  used  oy  knowieage-based 
systems  such  as  "expert  systems".  We  believe  these 
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schemes  have  a  lot  to  offer  in  the  area  of  Software 
Engineering  for  modeling  and  specifications. 

In  Al  research  on  semantic  networks,  several 
important  issues,  which  are  continually  being  investigated  in 
Al,  have  not  received  as  much  attention  in  Software 
Engineering.  These  include,  for  example,  the  following:  (a) 
formalization  of  rich  semantic  frameworks;  (b)  logical 
consistency  and  deduction;  fc)  precise  re  resentation 
when  there  is  uncertainty  or  incompleteness;  (d) 
specifications  or  models  considered  as  evolving  over  time, 
(e)  representation  of  defaults  and  exceptions;  (f) 
philosophical  underpinnings. 

The  Taxis  framework  itself  derives  from  Procedural 
Semantic  Networks  [Levesque79]  and  has  the  benefit  of 
borrowing  insights  and  results  from  this  and  related  work  in 
Al. 

2.2  Abstraction  Mechanisms 

The  second  basic  ingredient  of  the  Taxis  framework 
is  a  set  of  abstraction  mechanisms,  which  are  important  in  a 
modeling  language  for  structuring  and  organizing  large 
descriptions. 

An  abstraction  mechanism  is  a  conceptual  or 
lirguistic  mechanism  that  allows  certain  information  to  be 
highlighted  while  suppressing  other  information.  In  software 
engineering,  abstraction  is  usually  equated  with  the 
suppression  of  design  decisions  or  implementation  detail 
along  the  dimension  we  have  been  referring  to  above  as 
levels  of  specification".  However,  within  a  given  level,  the 
Taxis  framework  offers  a  set  of  complementary  abstraction 
facilities  based  on  the  notions  of  aggregation, 
classification,  and  generali zation  [Smith77]. 

If  we  define  a  property  to  be  a  directed  relationship 
between  two  objects,  aggregation  allows  one  to  view  an 
object  as  a  composite  of  the  objects  to  which  it  is  related 
by  properties.  For  example,  a  person  has  a  name,  an 
address,  and  so  on.  The  "abstraction"  here  is  that  one  may 
talk  about  an  object  while  choosing  to  ignore  its 
components  for  the  moment 

Along  an  orthogonal  dimension,  the  classification 
abstraction  allows  individuals  to  be  grouped  into  classes 
(and  classes  into  metaclasses)  that  share  common 
properties.  A  class  represents  a  generic  concept,  such  as 
person"  or  "employee",  and  it  also  serves  as  a  template  for 
the  members  of  the  class,  which  are  called  its  nstances. 
The  class  is  defined  by  structural  (also  called  definitional) 
properties,  which  express  such  information  as  'persons 
have  a  name,  an  age,  address  '  and  so  on.  Instances  have 
factual  properties  which,  for  example,  attribute  sp*cifie 
rones,  ages,  etc.  to  individual  persons.  The  "nb;?f rrcl.on 
here  is  that  one  may  dosci.be  a  class  without  re'errinq  to 

i  v  O  tflv  (Ul  h.w  j. 


Generalization  (and  its  converse,  specialization) 
allow  the  common  properties  of  several  classes  to  be 
abstracted  into  the  definition  of  a  single,  more  general, 
class.  Tor  example,  the  class  of  persons  can  be 
represented  as  n  generalization  of  the  classes  representing 
males,  females,  managers,  engineers,  female  engineers,  and 
so  on. 

Accordingly,  classes  are  organized  into  a  hierarchy 
with  general  classes  located  above  their  specia’izations.  If 
one  class  is  defined  to  be  a  specialization,  or  subciass  of 
another  (a  supe'e/ass),  every  instance  of  the  first  is 
considered  to  be  an  instance  of  the  second,  e  g.,  every 
instance  of  the  class  cf  employees  is  an  instance  of  the 
class  of  persons.  Perhaps  the  most  important  co  .sequence 
of  this  organization  is  that  (structural)  proport  ^s  can  be 
inherited  from  superclass  to  subclass,  e.g.,  the  class  of 
employees  inherits  properties  such  as  name,  address,  and 
so  on,  front  the  class  of  persons. 

Generalization  hierarchies  are  referred  to  as  IS-A 
hierarchies  in  Artificial  Intelligence.  The  taxonomic 
organization  provided  by  IS-A  hierarchies  can  lead  to  models 
that  are  understandable  and  consistent,  because  the  more 
that  classes  have  in  common  with  each  other,  the  "closer" 
they  are  located  to  each  other  in  the  hierarchy.  Also,  IS-A 
hierarchies  can  lead  to  more  concise  models,  since  it  is 
sufficient  to  associate  (structural)  properties  to  the  mest 
gene  al  applicable  class  and  lei  inheritance  imply  the  rest. 

The  name  IS-A  hierarchies"  arose  from  sentences  in 
English  such  as  "an  employee  is  a  person"  and  John  is  a 
person  .  One  should  be  careful,  however,  to  distinguish 
between  them:  the  first  expresses  an  IS-A  relationship 
between  two  classes,  while  the  second  expresses  a 
relationship  between  ihe  class  of  persons  and  one  of  its 
instances. 

The  premise  posed  by  the  Taxis  Project,  that  this 
simple  framework  basc:d  on  abstraction  mechanisms  can  be 
used  effectively  to  strucb.re  models  of  all  kinds  and  for  all 
parts  of  these  models,  does  not  mean  that  this  framework 
by  itself  is  a  complete,  universal,  or  general-purpose 
modeling  framework,  only  tiiat  certa;n  "epistemological 
primitives'  [Rrachman79]  form  its  primary  foundations.  In 
fact,  as  the  reader  will  see  in  the-  next  section,  special- 
purpose  features  are  used  by  Taxis  for  modeling  each 
system  aspect,  by  allowing  several  types  of  objects  and 
properties;  they  are,  however,  all  molded  into  the  Taxis 
framework. 

3  THf  UK'S  LANGUAGES 

This  section  presents  o  nccessardy  bee'  account  of 
some  of  the  doserip'ivn  tools  of  the  Taxis  design  and 
requirements  languages.  More  details  about  trio  design 
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discussed  in  [Greenspan82a]  and  [Greenspan83]. 

3. I  The  Taxis  Conceptual  Model 
3.1.7  Objects 

As  indicated  earlier,  a  fundamental  feature  of  Taxis 
is  that  it  provides  a  conceptual  framework  for  modeling  the 
world.  The  framework  is  object-centered ,  i.e.  provides 
objects  as  the  basic  building  block  for  constructing 
semantic  descriptions. 

All  objects  are  stratified  into  classification  levels 
according  to  whether  they  are  considered  individuals  ', 
called  tokens,  collections  of  individuals  called  classes, 
classes  of  classes  called  metaclasses  and  so  on.  The  tokens 
of  a  class  are  called  its  instances-,  similarly,  a  class  is  said 
to  be  an  instance  of  a  metaclass. 

Simple  examples  of  object  tokens  are  john-smith 
(representing  a  particular  person)  and  7  (representing  the 
number  7)  PERSON  is  an  example  of  a  class,  whose 
instances  are  tokens  that  represent  persons  such  as 
john-smith;  INTEGER  is  also  a  class,  with  integer  objects 
such  as  7  as  instances. 

An  example  of  a  metaclass  is  PERSON-CLASS  whose 
instances  are  classes  of  persons  such  as  PERSON,  PATIENT, 
EMPLOYEE,  etc. 

We  will  use  lower  case  letters  for  token  identifiers 
and  upper  case  letters  for  class  and  metaclass  identifiers. 
The  suffix  -CLASS  '  will  be  used  for  metaclass  identifiers. 

3. 1.2  Properties  -  Factual  and  Structural 

Objects  can  be  related  to  each  other  through  factual 
proper i i es,  intended  to  represent  binary  relationships  in  the 
world.  A  factual  property  has  three  components:  a  subject, 
an  attribute  and  a  value.  To  represent  the  fact  that  John 
Smith's  aye  is  26.  ve  use  the  factual  property  specified  by 
the  triple 

<john-smith,  age,  26> 

where  the  subject,  attribute  and  value  are  respectively 
john-smith,  age  (an  identifier)  and  26. 

Generic  information  that  pertains  to  all  instances  of  a 
class  or  metaclass  is  represented  through  structural 
properties  associated  with  a  (meta)class.  For  example,  the 
triple 

[PERSON, ape,  AGE-VALUE] 

represents  ‘hr;  general  statement  that  every  person  has  an 
a  i  by  specifying  that  every  instance  of  the  class  PERSON 

r  '  ;  '  '  e  1  •  S'  OS  from 

i ' i ■  •  r!  * s  AGF-VAL'JP.  (Note  that  v;o  have  used  square  and 
annular  brackets  to  distinguish  the  two  types  of 


properties.) 

It  may  be  helpful  to  think  of  a  structural  property  as 
a  function,  e.g. 

age:  PERSON  — >  AGE-VALUE 

whose  domain  is  the  property  subject  and  whose  range  is 
the  property  value.  Evaluation  of  the  function  for  an 
instance  of  the  domain  results  in  accessing  the 
corresponding  factual  property,  e.g. 

age(john-smith)  =  26 

An  important  feature  of  structural  properties  is  that 
they  determine  what  can  and  cannot  go  in  a  Taxis  model. 
Thus,  in  general,  if  C  is  a  (meta)class  with  structural 
property  [C,  a,  V],  for  every  instance  x  of  C  it  must  be  the 
case  that  a(x)  is  either  undefined  or  an  instance  of  V.  This 
we  call  the  Property  Induction  Principle  and  it  is  one  of 
several  that  define  a  basic  structure  for  models  described 
in  the  Taxis  framework.  (Some  other  principles  are  those 
concerning  the  IS-A  hierarchy  and  are  discussed  below.) 

Since  classes  are  themselves  objects  and  instances 
of  metaclasses,  they  can  have  factual  properties  too.  For 
example,  the  information 

The  average  aje  of  persons  is  33" 

or 

"There  are  200  nurses" 

can  be  represented  with  the  factual  properties 

<PERS0N,  average-age,  33> 

<NURSE,  cardinality,  200>. 

The  inclusion  of  metaclasses  in  the  framework  allows  the 
property  induction  principle  to  apply  to  these  factual 
properties  as  well.  This  dictates  the  presence  of  structural 
properties  such  as 

[PERSON-CLASS,  average-age,  AGE-VALUE] 
[PERSON-CLASS,  cardinality,  NUMBER] 

in  a  model  that  wants  to  talk  about  the  average  age  or  the 
size  of  different  person  groups. 

3. 1.3  Abstraction  Mechanisms 

So  far  we  have  discussed  two  of  the  three 
abstraction  mechanisms  supported  by  the  Taxis  framework. 
The  classification  abstraction  is  supported  by  the 
instance-of  relation  between  objects.  Moreover,  one  can 
think  of  an  object  as  an  aggregate  of  all  of  its  properties 
(c.j.,  I'-,  k  t  ;>e-  >•!  -i.i  c  •:  eg ji'c  of  a  nw.io,  an 

address,  nr:  age,  a  soc'al  insurance  number,  etc.)  Properties 
are  thus  offered  in  support  of  the  aggregation  abstraction. 
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The  property  induction  principle  relates  those  two 
abstractions  in  a  coherent  way. 

The  third  abstraction  mechanism,  generalization,  is 
supported  by  the  is-a  relation  that  can  be  defined  between 
classes  and  metaclasses  according  to  their 
generality/specificity.  For  example,  the  classes  PATIENT, 
NURSE  and  DOCTOR  are  all  specializations  (i.e.,  is-a  related) 
to  PERSON: 

(PATIENT,  is-a,  PERSON) 

(NURSE,  is-a,  PERSON) 

(DOCTOR,  is-a,  PERSON) 

Clearly,  is-a  has  to  be  a  partial  order  relation  which  defines 
a  hierarchy  US-A  hierarchy)  for  classes  and  metaclasses. 
Two  important  features  of  the  is-a  relation  are: 

a.  If  (C,  is-a,  D)  then  every  instance  of  C  must  also  be 
an  instance  of  D 

b.  If  (C,  Is-a,  D)  and  D  has  a  structural  property  [D,  a,  V] 
then  C  must  also  have  a  structural  property  [C,  a,  V'], 
where  V'  is  a  specialization  of  V,  (V‘,  is-a,  V). 

The  second  feature  of  is-a  suggests  that  since  PERSON  has 
an  age  structural  property,  DOCTOR  must  also  have  an  age 
property 

[DOCTOR,  age,  DOCTOR-AGE-VALUE] 

and  that  DOCTOR-AGE-VALUE  is  more  specialized,  as  one 
would  expect,  that  AGE-VALUE,  i.e.  it  has  fewer  instances 
(say,  persons  under  18  can't  be  doctors). 

3.2  The  Design  Level 

We  now  present  a  few  examples  of  classes  and 
metaclasses  from  the  design  level  of  Taxis,  to  give  the 
reader  at  least  a  feeling  for  what  is  involved  in  using  the 
design  language. 

Let's  begin  by  defining  the  metaclass  PERSON-CLASS 
before  we  proceed  to  define  some  of  its  instances: 

metaclass  PERSON-CLASS  with 
association 

average-age:  AGE-VALUE, 
cardinality:  NUMBER, 
end  (PERSON-CLASS) 

Figure  7 

Each  property  in  a  definition  falls  into  some  property 
category.  The  two  structural  properties  of  PERSON-CLASS 
have  been  assigned  to  the  association  property  category, 
which  means  that  corresponding  factual  properties  can 
change  their  val'jcs  over  t  .  ■  \  .m  c, ..  cri ',  each  \u-  ur«! 
property  belongs  to  a  unique  category  and  this  determines 
what  can  be  done  to  the  values  of  that  property. 


Next  we  define  the  classes  PERSON,  NURSE  and 
PATIENT  in  Figure  2: 

PERSON -CLASS  PERSON  with 
key  :  (si#), 
characteristic 

si#:  SOC I AL- INSURANCE#, 
oh  ip#:  OH  IP#, 
name:  PERSON -NAME, 
association 

age:  AGE -VALUE, 
address:  ADDRESS, 
end  (  PERSON  j 

PERSON-CLASS  NURSE  is-a  PERSON  with 
association 

dept:  HOSPITAL -DEPARTMENT, 
salary:  DOLLAR-AMOUNT , 
supervisor:  hEAD-NURSE , 
end  (NURSE j 

PERSON-CLASS  PATIENT  is-a  PERSON  with 
key :  (hospital-admission#,ohip#), 
characteristic 

hosp-ad#:  HOSPITAL -ADM I  SSI  ON#, 
association 

ward:  HOSP I TAL -WARD , 
doctor :  DOCTOR, 
end  (PATIENT! 

Figure  2 


Characteristic  properties,  such  as  si#  and  ohip#,  cannot 
change  value  once  they  are  specified.  Key  properties,  on 
the  other  hand,  define  one-to-one  mappings  from  a  property 
value  (e  g.  SOCIAL-INSURANCE#)  to  a  property  subject  (e.g. 
PERSON;  and  are  thus  analogous  to  record  or  tuple  keys. 

Note  that  a  data  class  may  have  several  keys  (e  g.  PATIENT, 
which  has  three),  or  none  at  all. 

Transaction  classes  define  operations  that  can  be 
performed  on  the  data  classes  of  a  Taxis  specification.  For 
example,  the  transaction  ADMIT -PATIENT  takes  a  person  as 
parameter  and  makes  it  a  patient  after  checking  certain 
conditions  ( prerequisites )  to  make  sure  that  admission  is 
possible 

transaction  ADM  I  T -PAT  I  ENT  with 
parameters  :  (  p  )  , 
local 

p:  PERSON, 
prerequisite 

has-ohip?:  ohip#(p)  ~=  nothing, 
space?:  admissions  <  max-adm i ss i ons , 

action 

admit:  insert  p  in  PATIENT, 
update:  admissions  <-  adm i ss i ons+ 1  , 
end  j ADM  I T -PAT  I  ENT  j 

F igure  3 


All  information  relevant  to  the  ADMIT-PATIENT  transaction  is 
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specified  through  structural  properties  of  different 
categories.  Thus  the  parameters  property  specifies  the 
parameter  list  for  the  transaction,  local  specifies  local 
variables,  prerequisite  indicates  conditions  that  must  be 
checked  before  each  execution  of  the  transaction.  Finally, 
action  properties  specify  the  "body"  of  the  transaction,  i.e. 
the  actions  to  be  carried  out. 

Transaction  classes,  like  all  other  types  of  classes, 
can  be  specialized  in  terms  of  the  is-a  relation.  For  instance, 
we  may  want  to  define  a  specialization  of  the  ADMIT-PATIENT 
transaction  which  admits  patients  to  be  operated  on  and 
requires  special  conditions  and  actions  such  as  making  sure 
the  patient  has  done  all  necessary  tests: 

transaction  ADMIT-SURGERY-PAT  I  ENT 

is-a  ADMIT-PATIENT  with 
prerequisite 

tests?:  PAT lENT-TESTED(p) , 
end  j  ADMIT -SURGERY -PAT  I  ENT  j 

Figure  4 

According  to  the  semantics  of  the  is-a  relation,  the  ADMIT- 
SURGERY-PATIENT  transaction  inherits  all  the  properties  of 
ADMIT-PATIENT  and  has  an  additional  prerequisite  (tests?). 
We  consider  this  feature  of  the  Taxis  framework 
particularly  useful  because  it  helps  the  designer  of  an 
information  system,  such  as  a  hospital  admission  system, 
conceptualize  the  large  number  of  conditions,  rules  and 
regulations  and  structure  his  specification  accordingly. 

We  all  know  that  rules  and  regulations  in  a  social 
setting,  such  as  a  hospital  environment,  are  bound  to  have 
exceptions.  Taxis  provides  the  designer  of  an  information 
system  with  the  capability  of  specifying  exception  classes 
and  to  associate  those  with  transaction  prerequisites.  If  a 
prerequisite  is  found  to  be  false  when  its  transaction  is 
called,  the  corresponding  exception  class  is  instantiated 
and  an  appropriate  exception  handler,  also  a  transaction,  is 
called  as  a  substitute  for  the  initial  transaction  call.  We  view 
exceptions  as  an  additional  abstraction  which  can  help  the 
designer  deal  with  overabstraction  with  respect  to  one  of 
the  other  abstraction  dimensions.  The  availability  of 
exceptions  encourages  the  designer  to  first  deal  with  the 
"normal"  cases,  the  "rules",  and  then  worry  about  the  things 
that  can  go  wrong  and  how  they  should  be  dealt  with.  It  must 
be  added  here  that  our  exception  mechanism  is  fairly  limited 
and  requires  extensions  in  order  to  handle  some  of  the 
exceptional  situations  we  have  encountered  in  a  hospital 
setting. 

The  facilities  we  have  described  so  far  allow  the 
modeling  of  entities  and  operations  on  those  enhties.  An 
essential  component  of  any  social  system  involves  (long 
form)  processes,  such  as  the  process  of  dealing  with  a 
I...;  :  .  .  :T.  .  ’  It  n<  . •  .-'it' 

tl.e  .TiOi.sc:. t  ho  leav  es.  H  e  Taxis  framework  provides  sr'ip'. 
c  lasses  an  a  modeling  tool  for  long  term  processes.  A  script 


is  a  modified  petri  net  consisting  of  states  and  transitions. 
For  example,  Figure  5  (from  [DiMarco83])  shows  a  script  for 
patients  who  need  a  (heart)  pacemaker. 


PATiEMT-ScRipT 


Figure  5 

For  every  pacemaker  patient  there  is  an  instance  of  this 
script  which  remains  active  from  the  time  he  (or  his  doctor) 
decides  to  have  a  pacemaker  implanted  until  the  time  he 
stops  the  treatment.  Figure  5  only  shows  states  and 
transitions;  it  does  not  show  invariant  assertions,  which  can 
be  associated  with  states  and  which  must  be  true  while  a 
state  is  active.  It  also  omits  the  actions  associated  with 
each  transition,  such  as  creating  another  script  instance, 
communicating  with  another  sedpt  instance,  accessing  or 
modifying  the  database,  etc. 
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Our  script  construct,  based  on  [Zisman?8]  and 
originally  proposed  in  [Barron80,  82],  is  further  refined  in 
[Pilote83a,  83b],  One  of  its  distinguishing  characteristics  is 
that  it  offers  communication  facilities  (as  proposed  in 
[Hoare78]),  so  that  script  instances  can  communicate  with 
each  other  for  synchronization  and  information-sharing 
purposes.  Another  unique  characteristic  is  that  scripts  are 
integrated  completely  into  the  Taxis  framework.  Thus,  script 
classes  are  organized  into  an  IS-A  hierarchy  according  to 
their  generality/specificity,  have  their  states  and 
transitions  defined  in  terms  of  structural  properties,  and 
their  instances  can  be  accessed  through  the  same  facilities 
used  to  access  instances  of,  say,  data  classes. 

3.3  The  Requirements  Specification  Level 

This  level  offers,  through  RML,  facilities  for 
requirements  modeling.  A  fundamental  premise  of  the  Taxis 
project  is  that  one  conceptual  framework,  such  as  that 
adapted  here,  can  be  used  at  the  design  and  the 
requirements  specification  level.  What  does  change  as  one 
moves  from  one  level  to  the  other  are  the  kinds  of  classes 
one  has  at  his  disposal  as  he  constructs  his  specification. 

For  the  design  level  these  are  data,  transaction,  exception 
and  script  classes.  For  the  requirements  level,  on  the  other 
hand,  they  are  data  (  entity  ),  activity  and  assertion 
classes.  Informally,  data  classes  correspond  fairly  directly 
to  Taxis  data  classes.  Activity  classes  are  intended  to 
model  both  instantaneous  actions  and  long-term  events  and 
correspond  to  transactions  and  scripts.  Finally,  assertions 
are  logical  formulas  making  statements  about  the  world,  the 
rest  of  the  requirements  specification,  or  the  relation 
between  the  two.  Assertions  serve  as  property  values  in 
several  useful  property  categories,  and  are  meant  to 
replace  expressions  and  statements  of  the  design 
language,  and  make  the  requirements  level  less  procedural, 
as  one  might  have  expected. 

activity  ADMI  T-PAT  I  ENT  with 
input 

p:  PERSON, 
control 
w:  WARD, 

phys ,  consu I t i ng-phys :  DOCTOR, 
output 

pt:  PATIENT, 
tri  ggered-by 
a  r  r :  ARR I VAL ( p ) , 
precondition 

al  ready-in?:  not(  p  instance-of  PATIENT) 
ohip?:  not  (ohip//(p)  =  undefined) , 
space?:  card i na I i ty(PAT I  ENT ) 

<  maxpa t i en  t 

postcondition 

admitted?:  I N-HOSP I TAL (p ) , 
patient?:  (pt  =  p) 

and  (p  instance-of  PATIENT), 

part 

i nc  r  emen  t :  I NCREMENT (cardinal  i ty( PAT  I  ENT ) ) , 
put:  CHOOSE -WARD( p ,w,phys , consu I t ing-phys) , 
end  | ADMI  T-PAT I  ENT ( 


We'll  only  present  here  an  example  of  an  activity 
class  for  the  event  of  admitting  a  patient: 

The  activity  defined  In  Figure  6  has- as  input  a 
person,  as  output  a  patient,  and  three  control  properties  (w, 
phys  and  consulting-phys)  which  specify  objects  that  are 
accessed  but  are  not  modified  by  the  activity.  An  Instance 
is  created  whenever  the  tri  ggered-by  assertion  becomes 
true. 

Preconditions  and  postconditions  relate  the  activity 
to  assertions  which  must  be  true  at  the  start  and  at  the  end 
of  the  activity,  respectively.  These  have  a  different 
semantics  than  Taxis  preconditions  and  postconditions.  In 
Taxis,  these  conditions  are  checked  sequentially  during 
execution  (raising  exceptions  when  a  false  condition  is 
encountered).  However,  in  RML,  there  is  no  notion  of  control 
flow;  rather,  the  RML  conditions  form  part  of  the  definition 
of  the  activity  and  state  that  for  activities  in  this  class  the 
preconditions  are  true  at  the  start  time  and  the 
postconditions  are  true  at  the  end  time.  In  this  sense,  RML 
abides  by  the  frequently  stated  proverb:  The  requirements 
should  express  WHAT  the  system  should  do  but  not  HOW. 
Taxis  is  used  to  express  HOW. 

Lastly,  we  note  that  activities  have  parts,  whose 
occurrences  form  part  of  the  occurrence  of  the  overall 
activity.  There  is  no  built-in  assumption  about  when  or  if  the 
parts  occur  with  respect  to  the  overall  activity,  but  rather 
the  specifier  may  give  constrai nts  (using  assertions)  to 
express,  for  example,  that  one  part  must  happen  before 
another,  or  that  the  outputs  from,  say,  two  parts  are  inputs 
to  a  third,  in  which  case  the  inference  can  be  drawn  (but 
need  not  be  explicitly  stated)  that  the  third  part  cannot 
finish  before  the  first  two  have  produced  their  outputs. 

When  an  RML  activity  is  realized  in  Taxis,  all  of  the 
objects  and  constraints  of  the  RML  specification  must  be 
designed  into  the  Taxis  program.  Clearly,  the  designers 
have  quite  a  free  reign  to  impose  design  decisions,  as  long 
as  they  do  not  design  a  Taxis  program  that  violates  these 
constraints.  More  strongly,  the  designed  system  Is 
responsible  for  satisfying  the  RML  specification,  which  is  to 
say  that  all  RML  objects  must  be  somehow  represented  in 
the  Taxis  program,  which  includes  maintaining  all 
constraints.  YJe  might  add  that  RML  activities  do  not 
necessarily  get  translated  directly  into  Taxis  scripts  or 
transactions.  The  information  specified  by  the  activity 
might  be  allocated  in  any  number  of  ways  to  Taxis  objects 
and  properties,  e.g.  RML  constraints  may  become  Taxis 
preconditions.  Similarly,  an  RML  data  object  may  not  become 
a  Taxis  data  class,  but  might,  for  example,  be  implemented 
as  a  value  returned  by  a  transaction. 


Figure  6 
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4  STATUS  AND  PROSPECTUS 
4.7  Implementation 

Nixon  has  implemented  a  compiler  for  Taxis  programs 
[Nixon83],  The  target  language  is  Pascal,  augmented  with 
relational  database  facilities  such  as  those  provided  by 
Pascal/R  [Schmidt80]. 

Taxis  data  hierarchies  are  translated  into  relational 
schemata,  and  hierarchies  of  transactions  are  translated 
into  the  block  structure  of  Pascal.  The  output  of  the 
compiler  is  a  program  containing  definitions  of  all  classes 
and  transactions,  routines  to  enforce  constraints,  a 
database  interface  (as  provided  by  Pascal/R),  and  a  near- 
empty  database. 

There  were  some  interesting  implementation 
problems  that  had  to  be  solved.  First,  because  classes  can 
have  multiple  superclasses,  there  is  the  possibility  of 
conflicting  inherited  properties.  Second,  since  Taxis 
provides  declarative  facilities  for  specifying  exceptions  and 
exception-handling,  a  mechanism  was  needed  for  safe 
recovery  from  execution-time  conditions  that  would  be 
"fatal  errors"  in  many  programming  languages. 

An  interactive  environment  has  been  designed  and 
implemented  by  O'Brien  [0'Brien82].  It  includes  a  class- 
oriented  editor,  whose  commands  and  functionality  are 
centered  on  Taxis  classes;  a  semantic  consistency  verifier 
which  insures  that  Taxis  programs  conform  to  the  semantic 
rules  of  Taxis;  and  an  interpreter  and  debugger  for 
prototyping.  The  editor  provides  the  information  system 
designer  with  facilities  to  construct,  inspect,  and  modify  a 
Taxis  program.  The  consistency  verifier  performs  various 
checks  to  ensure  the  correctness  of  the  conceptual  model 
being  specified.  The  interpreter  simulates  execution  of 
Taxis  programs,  and  the  debugger  assists  the  designer  In 
validating  the  model.  The  design  environment  also  provides 
various  other  aids  to  the  user,  such  as  an  online  help 
facility,  a  documentation  generator,  and  a  way  of  keeping 
track  of  multiple  versions  of  models. 

The  implementation  of  the  Taxis  compiler  is  being 
extended  to  handle  the  execution  of  scripts  [Chung83]. 
Some  interesting  problems  arise  in  implementing  scripts  due 
to  their  generality.  Scripts  allow  the  modeling  of  concurrent 
processes,  and  they  also  allow  the  user  to  specify  that  the 
system  should  "wake  up"  at  a  particular  time  or  after  a 
certain  amount  of  time  has  elapsed.  Chung  has  found  an 
efficient  way  to  ensure  that  invariants  are  maintained  in 
every  state,  and  to  check  for  trigger  conditions,  any  number 
of  which  could  become  true  at  any  time. 


handle  scripts  as  well  [Park83]. 

4.2  Software  Development 
4.2.1  Taxonomic  Programming 

A  methodology  for  specification/modeling  should 
provide  guidance  to  its  users.  At  the  heart  of  many 
software  development  methodologies  lies  one  or  more 
abstraction  mechanisms,  which  allow  us  to  ignore  details  at 
some  level,  plus  a  refinement  principle  which  provides  for 
the  guided  and  gradual  reintroduction  of  details  across  the 
abstraction  dimension.  The  aggregation  abstraction  forms 
the  core  of  software  design  methodologies  such  as 
"stepwise  refinement"  [Wirth71].  Similarly,  the 
"implementation"  dimension,  the  one  which  we  have  referred 
to  as  "levels",  is  the  basis  for  approaches  such  as  the 
abstract  machine  approach  [Dijkstra72],  The  generalization 
abstraction  has  not  been  exploited  very  much  in  Software 
Engineering  (although  Simula  does  have  classes  with  a 
rudimentary  inheritance  mechanism).  Yet,  it  is  our 
contention  that  it  is  an  invaluable  organizational  tool  for 
system  description  in  general,  and  for  requirements 
modeling  in  particular. 

The  main  idea  of  specification  guided  by 
generalization  is  that  a  model  can  be  constructed  by 
modeling  first  the  most  general  classes,  and  then 
proceeding  to  more  specialized  classes.  For  example,  in 
modeling  a  hospital  world,  one  might  consider  first  the 
concepts  of  patient,  doctor,  admission,  treatment,  etc. 
Later,  the  modeler  can  differentiate  between  child  patients, 
heart  patients,  internists  and  surgeons,  surgical  and 
medical  treatments,  etc.  At  each  step,  only  the  information 
(properties)  appropriate  to  that  level  are  specified. 

Generalization  is  the  appropriate  principle  to  exploit 
when  the  difficulty  of  modeling  is  due  to  a  large  number  of 
details  rather  than  due  to  the  algorithmic  complexity  of  the 
system/world;  a  hierarchy  of  classes  organized  along  this 
dimension  provides  a  convenient  structure  for  distributing 
information  (expressed  uniformly  as  properties  in  RML)  and 
associating  it  where  it  most  naturally  belongs.  Such 
stepwise  ref  i  nement  by  special!  zation  [Borgida82b]  is 
orthogonal  and  complementary  to  the  more  usual  "stepwise 
refinement  by  aggregation",  whose  main  effect  is  to 
decompose  complex  situations  into  a  number  of  less 
complex  ones.  Both  kinds  of  refinement  are  orthogonal  and 
complementary  to  a  third  dimension,  the  progression  from 
"world-oriented  specifications  to  specifications  of  a  more 
and  more  completely  implemented  system. 

4.2.2  Requirements  /Modeling 


The  Taxis  design  environment  is  being  expanded  to 
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4.2.2. 1  Relationshi p  of  RML  to  Taxis 

Since  RML  is  based  on  the  same  organizational 
principles  as  Taxis,  it  can  also  exploit  stepwise  refinement 
by  specialization.  Using  RML  for  requirements  definition 
shou'd  be  a  much  easier  task  than  going  directly  to  a  design 
model  in  Taxis,  since  Taxis  models  involve  more  programming 
details,  given  an  RML  requirements  model,  the  transition  to 
Taxis  should  be  relatively  straightforward,  since  their 
underlying  structure  is  based  on  the  same  framevvork 
[Greenspan82b].  Further  work  is  needed  to  work  out  a 
systematic  way  to  add  design  decisions  that  are  needed 
to  derive  a  Taxis  program  from  an  RML  model. 


appears  in  an  accompanying  natural  language  narration  for 
each  diagram. 

RML  is  used  to  formalize  this  information.  Classes 
and  properties  are  defined  to  correspond  to  the  terms 
introduced  in  the  SADT  diagrams.  For  each  feature  of  a 
diagram  (box,  node,  arrow,  nodes  where  arrows  split  and 
join),  a  concept  or  constraint  is  defined  in  RML.  The 
specifier  decides  on  the  precise  meanings  of  the  words  in 
the  SADT  model  and  writes  them  down  in  RML.  The  semantic 
relationships  expressed  in  the  RML  model  are  constrained 
by  the  connectivity  of  the  SADT  diagram  from  which  it  is 
derived 


4.2.2.2  Relctionshi p  of  RML  to  SADT 

The  difficulty  of  building  a  high-level  requirements 
specification  as  in  RML  should  not  be  understated.  In  the 
initial  stages  of  requirements  definition,  all  of  the  parties 
involved  are  faced  w'th  the  problem  of  deciding  what 
concepts  and  phenomena  are  relevant  to  the  situation  at 
hand,  agreeing  on  terminology,  and  conveying  their  mental 
models"  of  the  situation  to  each  other.  Whatever  the 
application  domain  (e.g.,  medical,  tax  law,  manufacturing), 
the  knowledge  about  that  application  needs  to  be 
understood,  documented,  and  communicated. 

We  propose,  therefore,  that  requirements  be  defined 
in  two  steps: 

The  first  would  use  a  language  for  structured 
analysis  such  as  SADT  [Ross77b],  in  which  terms  are 
introduced  in  an  organized  way. 

The  second  would  use  RML  for  semantic  modeling, 
which  gives  definitions  of  the  semantics  of  the 
concepts  introduced  in  the  first  step. 

In  [Greenspan83],  the  connection  between  SADT  and  RML  is 
made.  SADT  was  chosen  because  it  has  been  widely  applied 
and  commercially  successful,  and  its  simplicity  and 
symmetry  imply  that  deeper  principles  are  at  work. 

SADT  provides  a  way  of  introducing  concepts/terms 
into  the  requirements  specification  by  a  process  of 
stepwise  decomposition  (expanding  a  concept  box"  into  a 
"diagram"  containing  several  interconnected  boxes).  The 
result  is  a  hierarchically  organized  structure  of  interrelated 
terms.  SADT  is  founded  on  a  notion  of  "system"  (data  and 
activity  aspects  inter-related  by  input,  output,  and  control 
relationships),  so  an  SADT  model  may  itself  be  used  as  a 
model  of  the  world/system.  The  SADT  diagrams  provide  a 
"structured  lexicon",  a  sort  of  roadmap  to  guide  the  RML 
modeling  process. 

However,  the  interpretation  of  SADT  diagrams  is 
dependent  on  the  meaning  of  words  and  phrases  of  the 
"embedded"  language,  uju  -.Sly  a  natural  language  such  a 
Fnglish.  that  is  used  to  label  the  boxes  and  arrows,  and 
additional  information  about  what  the  diagram  means 


RML  was  designed  with  SADT  in  mind,  so  that  data 
and  activity  concepts  in  RML  can  be  used  to  give  the 
semantics  of  SADT  data  and  activity  boxes,  and  the  various 
uses  of  assertions  in  RML  provide  the  expressive  power  to 
express  properties  of  data  and  activity  and  any  other 
constraints. 


4.2.3  Prototyping 


The  ability  to  quickly  get  a  prototype  system  from  a 
specification  can  be  extremely  useful  for  seeing  if  the 
de'  igned  system  specifies  the  intended  behavior  and  to  g». 
feedback  from  users.  Prototypes  can  be  especially  helpfu- 
for  determining  what  the  user  really  wants.  The  Taxis 
interactive  environment  already  provides  some  facilities 
that  support  prototyping,  for  interpreting  Taxis  designs,  and 
our  future  implementation  plans  are  consistent  with  this 
philosophy. 


4.2.4  Ex perience 

Taxis  was  used  for  describing  a  medical  information 
system  for  the  Pacemaker  Center  at  the  Toronto  General 
Hospital  [DiMarco83].  The  system  keeps  track  of  patients 
who  have  received  a  cardiac  pacemaker.  Its  job  includes 
keeping  accurate  medical  records,  monitoring  the  patients’ 
status,  analyzing  patient  data,  and  providing  reports.  This 
system  is  a  good  example  of  a  system  that  has  a  large 
number  of  details,  and  a  critical  need  for  data  integrity.  The 
semantic  component  of  Taxis  was  used  to  describe 
patients,  pacemakers,  and  medical  data,  and  scripts  were 
used  to  model  the  progress  of  the  patients  through  various 
activities  such  as  several  kinds  of  assessments  and 
treatments,  scheduling  and  administrative  matters. 

The  most  difficult  part  proved  to  be  the  acquisition  of 
knowledge  from  the  hospital  staff.  No  specific  method  was 
used  for  this  task,  and  the  experience  underscored  the 
need  for  a  requirements  modeling  step  to  support  this  task 
before  Taxis  design  was  undertaken.  Once  the  pacemaker 
center  operations  were  well  understood,  programming  in 
Taxis  was  relatively  effortless.  Furthermore,  the  structure 
of  the  Taxis  program  strongly  reflects  the  actual  life-cycles 
cf  p'"  tic;:  Is  end  pacemakers. 
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Taxis  has  also  been  used  to  describe  a  medical 
information  system  for  managing  clinical  trials  [Buchan82a, 
Buchan82b].  A  clinical  trial  is  a  controlled  experiment  in 
which  groups  of  subjects  receive  medical  treatment  and  the 
cause  and  effect  relationship  between  the  intervention  and 
its  outcome  is  investigated.  The  system  keeps  trank  of  the 
subjects,  the  procedures,  and  experimental  data;  it  is  used 
to  promote  both  the  safety  of  the  patients  and  adherence 
to  the  trial  protocol. 

Clinical  trials  prove  an  interesting  application  area 
for  Taxis  [Buchan82b]: 

There  is  an  intrinsic  simplicity  and 
similarity  in  clinical  trials,  despite  their  apparent 
overwhelming  variety.  There  are  prevention  and 
intervention  trials;  small  trials  of  a  few  subjects, 
and  large  multi-centre  trials  involving  tens  of 
hundreds  of  institutions  and  thousands  of 
patients.  They  differ  in  such  details  as  inclusion 
or  exclusion  criteria  for  the  acceptance  or 
rejection  of  patients  into  the  trial;  methods  for 
allocating  patients  to  experimental  and  control 
groups;  the  many  types  of  treatment, 
investigation  and  data  collection  steps  that  make 
up  the  protocol;  and  so  on.  But  their  basic 
processes,  in  investigating  the  cause  and  effect 
relationship  between  some  intervention  and  its 
outcome,  are  the  same. 

It  is  necessary  to  capture  and  organize  these  concepts  in  a 
way  that  highlights  the  commonality  between  clinical  trials 
and  suppresses  the  differences.  The  abstraction  facilities 
are  an  aid  in  this  direction,  because  specialized  concepts 
appear  under  more  general  ones  in  the  IS-A  hierarchy,  so 
that  the  properties  that  are  the  most  widely  shared  come 
early  in  the  description  and  more  specialized  information 
comes  later.  Although  Taxis  has  procedural  exception¬ 
handling,  the  clinical  trial  scenario  points  out  the  need  for 
additional  work  on  the  representation  of  "exceptional 
objects”,  such  as  one  that  is  to  be  considered  an  instance 
of  a  cla^s  in  spite  of  the  fact  that  it  contradicts  some 
property,  as  in  [Lesperance80]. 

Given  a  language  such  as  Taxis,  one  can  consider 
the  possibility  of  developing  generic  systems  that  can  be 
tailored  to  a  particular  situation  by  specialization  and  the 
specification  of  exceptions. 

5  CONCLUSION 

The  Taxis  Project  has  designed  and  implemented  a 
variety  of  languages  and  tools  for  requirements  and  design. 
Although  they  draw  on  ideas  that  are  popular  in  Artificial 
Intelligence  (see  [Mylopoulos80a]  for  a  survey),  and  Data 
Base  Management  (see  [McLeodSO]  for  a  survey),  as  well 
as  in  some  programming  languages,  they  are  all  based  on  the 
same  object  oriented  fr—.icv.'u.  Ic ,  uses  some 

fundamental  abstraction  iiicChanisms  to  structure  and 
organize  specifications. 


Software  engineering  is  viewed  as  the  construction 
of  a  series  of  models,  starting  with  a  world-oriented 
requ  rements  model  (SADT  plus  RML),  then  a  ■faxis  design 
model,  and  ultimately  a  completely  implemented  system 
The  task  of  requirements  modeling  is  likened  to  the  task  of 
knowledge  representation  in  Artificial  Intelligence,  and  the 
Taxis  framework  applies  concepts  that  are  popular  in 
Artificial  Intelligence  (as  well  as  in  semantic  data  models)  to 
both  RML  and  the  Taxis  language. 

Other  work  in  the  same  spirit  includes  [Bubenko81, 
MittermoirSO,  Roussopoulos79.  Wilson79,  Yeh80]. 
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During  the  past  several  years  there  has  been  considerable 
research  within  AI  on  various  aspects  of  databases.  The  pur¬ 
pose  of  this  panel  is  to  provide  non  specialists  with  an  over¬ 
view  of  this  research.  To  that  end,  1  will  use  this  introduction 
to  survey  briefly  what  I  take  to  be  the  nature  of  the  field,  its 
significance  for  AI,  and  how  it  impacts  database  theory  and 
practice.  Each  panelist  in  turn  will  then  describe  in  greater 
detail  the  main  concerns  and  issues  surrounding  his/her  partic¬ 
ular  research  area. 


The  Mature  of  the  Field 

The  area  of  AI  and  databases  divides  naturally  into  the 
following  three  principal  categories.  (Names  in  parentheses 
refer  to  the  position  papers  of  the  corresponding  panelists). 

1.  Database  Interfaces  (King,  Webber) 

The  concern  here  is  with  providing  co-operative,  friendly 
interfaces  between  the  user  and  the  information  stored  in  the 
database.  Users  might  range  from  database  designers  and 
administrators  to  end  users  with  varying  degrees  of  sophistica¬ 
tion.  As  King  and  Webber  point  out,  a  wide  variety  of  AI 
issues  arise  in  this  setting:  query  evaluation  as  a  planning  pro¬ 
cess,  natural  language  input,  query  clarification,  co-operative 
responses  to  requests  etc.  Many  of  these  topics  are  linguistic 
in  nature  and  hence  overlap  considerably  with  Ai’s  longstand¬ 
ing  concern  with  natural  language.  While  most  natural 
language  research  in  databases  has  focussed  on  front-ends  • 
there  are  even  commercial  systems  available  now  -  a  growing 
body  of  literature  is  addressing  the  pragmatics  of  question  ask¬ 
ing  and  answering,  as  described  by  Webber. 

2.  Conceptual  Modelling  ( Mylopoulos) 

The  objective  of  this  research  is  to  provide  natural  ways 
of  representing,  organizing  and  structuring  knowledge  about 
application  domains  so  that  a  database  designer  will  have  a 
suitable  set  of  tools  (a  conceptual  model)  for  translating 


his/her  conceptualization  of  an  application  into  a  concrete, 
structured  representation.  Typica1  issues  of  concern  are:  How 
should  one  carve  up  a  world  into  the  right  conceptual  units  in 
order  to  facilitate  the  design  of  a  database?  How  do  these  con¬ 
ceptual  units  relate  to  one  another?  What  inferences  are  sanc¬ 
tioned?  In  wh3t  formal  language  should  such  world  views  by 
represented?  These,  of  course,  are  standard  issues  of 
knowledge  representation  in  AI,  so  it  is  not  surprising  that 
most  research  in  conceptual  modelling  informs,  and  is  informed 
by,  such  notions  as  semantic  networks,  property  inheritance, 
and  knowledge  representation  languages  like  KL-ONE,  KRL 
and  FRL. 

3.  Logic  and  Deduction  (Gallaire) 

Research  in  this  area  draws  on  longstanding  results  in  AI 
on  logic,  theorem  proving  and  deductive  question-answering. 
A  database  is  viewed  as  a  collection  of  logical  formulae  and 
queries  are  answered  by  proving  the  queries  using  the  formulae 
of  the  database  as  premises.  It  is  this  perspective  which 
underlies  current  work  on  logic  programming  and  databases 
[Kowalski  8 1  J. 

There  is  a  complementary  view  of  the  role  of  logic  in 
database  theory  [Reiter  83j.  The  argument  is  that  logic  pro¬ 
vides  a  language  for  the  abstract  specification  of  conceptual 
models  for  databases,  including  the  modelling  of  incomplete 
knowledge.  Such  a  conceptual  model  may  be  realized  in  any 
number  of  ways  by  any  number  of  data  representations  and 
operations  on  these  data  representations.  The  importance  of 
this  distinction  is  that,  by  respecting  it,  we  are  in  a  position  to 
prove  the  correctness  of  a  conceptual  model’s  realization  with 
respect  to  its  logical  specification.  This  is  a  latter  day  version 
of  McCarthy  and  Hayes’  distinction  between  the  epistemologi¬ 
cal  and  heuristic  adequacy  of  a  representation. 

Significance  of  Database  Research  for  AI 

The  practical  significance  for  AI  of  flexible  intelligent 
databases  is  obvious.  They  are  essential  for  any  AI  application 
requiring  access  to,  and  modification  of  large  knowledge  bases. 
But  there  is  a  less  obvious,  but  equally  important  scientific 
and  methodological  significance  for  AI  of  research  in  databases. 
And  that  is  that  typical  database  domains,  even  the  most 
mundane,  provide  a  very  rich  source  of  problems  for  founda¬ 
tional  studies  in  AI.  Moreover,  the  worlds  that  databases  are 
called  upon  to  represent  are  often  really  quite  simple,  at  least 
compared  to,  say,  representations  for  story  comprehension, 
medical  diagnosis,  or  electronic  circuit  design.  For  example. 
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databases  are  often  concerned  with  representing  such  simple 
worlds  as  inventories,  office  records,  airline  reservations  etc. 
As  a  result  of  the  simplicity  of  these  worlds,  the  fundamental 
A1  problems  that  arise  do  so  in  simple  and  perspicuous  settings 
so  that  the  essential  features  of  these  problems  reveal  them¬ 
selves,  unobscured  by  confusing  details.  Therefore,  solutions 
to  these  problems  are  easier  to  come  by.  It  is  3U0  clearer 
exactly  what  constitutes  a  solution.  In  her  position  paper, 
Webber  makes  a  similar  point,  and  shows  how  various  con¬ 
cerns  in  the  pragmatics  of  conversation  arise  with  wonderful 
clarity  in  a  database  setting.  By  the  same  token,  I  think  it 
was  no  accident  that  the  major  advances  in  computational 
linguistics  due  to  Woods  with  respect  to  parsing  3nd  semantics 
emerged  from  the  design  of  LUNAR,  a  natural  language 
front-end  to  a  database  of  facts  about  lunar  rock  samples. 

Many  other  examples  quickly  come  to  mind  of  current 
research  topics  in  AI  which  arise  in  a  particularly  perspicuous 
way  in  the  database  setting: 

1.  Non-monotonic  reasoning  and  the  closed  world  assumption 
for  the  treatment  of  negation  in  databases  [Reiter  78|. 

2.  Knowledge,  belief,  and  incomplete  information.  The  infor¬ 
mation  in  the  database  represents  what  it  knows  or 
believes  about  a  world.  It  may  have  only  partial  informa¬ 
tion.  The  database  should  'know”  that  there  are  certain 
things  it  doesn't  "know'’.  These  problems  lead  to 
metatheoretic  solutions  (Konolige  81]  and  modal  logics 
[Levesque  82). 

3.  Truth  maintenance  problems  [Doyle  79]  arise  in  databases 
subject  to  updates  and  deletions,  and  in  the  maintenance 
of  database  integrity. 

4.  Knowledge  representation  for  conceptual  modelling  [Mylo- 
poulos  81]. 

Because  of  examples  like  these  I  believe  that  databases 
will  come  to  be  used  as  a  fruitful  blocks  world  for  Al  research, 
a  blocks  world  with  the  nice  additional  property  that  it  has 
enormous  practical  significance. 

Significance  for  Database  Theory  and  Practice 

At  the  moment  there  is  a  wide  but  narrowing  culture  gap 
between  the  Al  community  of  database  resarchers  and  the 
large  community  of  researchers  outside  of  Al  who  are  involved 
with  database  theory  and  practice.  The  concerns  of  this  latter 
group  tend  to  be  far  more  practical  and  "low  level”,  involving 
as  they  do  issues  like  file  organization,  concurrency,  data 
models  (as  opposed  to  conceptual  models  -  see  Mylopoulos' 
position  paper),  efficient  retrieval  from  very  large  databases, 
multiple  user  views,  protection  and  security,  etc.  [Ullman  80], 
Moreover,  there  is  a  deeply  rooted  reluctance  to  viewing  a 
database  as  a  representation  of  some  world  i.e.  as  a  representa¬ 
tion  of  knowledge.  Mylopoulos  characterizes  this  distinction 
between  the  non-Al  and  Al  perspectives  oa  databases: 

"A  (non-Ai)  database  stores  ...  data.  In  the  case  of  databases 
there  is  no  commitment  to  an  interpretation  of  the  informa¬ 
tion  stored  in  the  database:  for  (AI)  knowledge  bases,  there 
definitely  is  such  3  commitment." 


This  divergence  of  opinion  about  the  nature  of  a  data¬ 
base,  together  with  Al’s  general  lack  of  concern  for  efficiency 
considerations  and  large  databases,  have  tended  to  separate 
the  two  communities.  Fortunately,  the  situation  is  changing, 
mainly  due  to  various  proposals  for  semantic  and  conceptual 
data  models  as  alternatives  to  the  relational  model  [Brodie  and 
Zilles  80,  Tsichritzis  and  Lochovsky  82].  Contributions  from 
logic,  deductive  question-answering,  and  logic  programming  are 
also  beginning  to  be  acknowledged.  The  influence  of  AI  on 
database  theory  and  practice  can  only  increase.  The  result 
will  be  user-friendly,  moderately  intelligent,  efficient,  well 
designed  database  systems.  Their  impact  will  rival  that 
currently  enjoyed  by  expert  systems. 

LOGIC  AND  DATABASES 

Herve  Gallaire 

Laboratoires  de  Marcoussis 
Centre  de  Recherches  de  la  C.G.E. 

Route  de  Nozay 
91460  Marcoussis  -  France 


A  logic  database  (LDB)  is  a  specification,  using  logic,  of 
some  data  model,  not  necessarily  relational.  Deductive  data¬ 
bases  (DDB)  are  a  proper  subset  of  LDB’s  which  provide  a 
natural  generalization  to  relational  databases  through  a  deduc¬ 
tive  mechanism.  We  argue  that  LDB’s  will  play  a  major  role 
in  the  future  of  database  (DB)  systems.  LDB’s  are  among  the 
major  contributions  of  AI  to  the  DB  field,  the  others  being  AI 
knowledge  representation  languages  and  natural  language 
techniques 

DDB's  contribution  is  far  more  reaching  than  was  first 
thought  after  the  pioneering  work  on  question  answering  sys¬ 
tems.  DDB’s  offer  the  following  features:  deductive  retrieval 
of  implicit  information,  not  stored  in  the  DB;  view  expression, 
access  and  maintenance  (a  view  is  nothing  but  an  external 
schema  of  the  DB  derived  from  its  conceptual  schema  and 
specific  to  a  class  of  users);  mappings  between  various  external 
models,  possibly  through  a  common  relational  DB,  enhanced 
with  deductive  capabilities;  a  trivial  embedding  of  a  DB  query 
language  in  a  programming  language,  namely  that  of  the 
deductive  component;  hints  and  possible  paths  to  solutions  to 
non-trivial  problems  such  as  integrity  constraint  expressions 
and  verification,  borrowing  techniques  found  in  the  deductive 
component:  semantic  as  well  as  syntactic  query  transforma¬ 
tions  geared  either  towards  improving  the  man-machine  inter¬ 
face  or  for  optimizing  data  retrieval. 

One  may  wonder  whether  these  features  will  eventually 
belong  to  commercial  products.  Although  there  is  no  simple 
answer  to  this,  it  is  felt  that  provided  enough  effort  is  put  on 
all  research  aspects  of  the  problem,  much  can  be  achieved. 
The  basic  reason  for  this  controlled  optimism  is  due  to  the  fact 
that,  for  the  first  time,  there  is  a  continuous  set  of  complemen¬ 
tary  tools  resting  on  a  common  formal  basis,  i.e.  logic:  rela¬ 
tional  database  machines  with  special  operations  for  DB  opera¬ 
tors  and  retrieval,  relational  DB  systems,  and  logic  program¬ 
ming.  Each  of  these  levels  already  flowers  with  competing  pro¬ 
ducts  which  are  even  partially  integrated;  for  instance  rela¬ 
tional  DB  machines  on  the  market  do  provide  a  good  interface 
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to  existing  DB  systems.  However  a  product  such  as  a  PRO¬ 
LOG  system  at  the  logic  programming  level,  has  not  yet  been 
truly  marketed;  hence  it  has  not  been  integrated  in  a  product 
with  a  relational  DB  system.  One  may  foresee  several  such 
prototypes  in  the  near  future.  The  first  phase  of  the  Japanese 
5th  Generation  project  is  providing  one  such  prototype. 

This  development  does  not  mean  that  such  3n  integration 
is  without  problems.  There  are  three  types  of  problems  to  be 
solved:  technical  problems,  performance  problems,  research 
problems.  The  first  type  has  to  do  with  problems  the  theory 
of  which  is  well  understood  but  which  lack  experiments  for 
deciding  important  issues.  Two  examples  of  these  issues  are 
decisions  about  compiling  or  interpreting  deductive  queries, 
and  about  generating  or  deducing  implicit  data.  The  reasons 
why  such  experiments  have  not  yet  been  conducted  has  to  do 
with  the  maturity  of  DB  experts;  not  too  long  ago  they  were 
(and  some  are)  still  disputing  the  fact  that  query  languages 
have  to  be  extended  rather  than  embedded  intq  programming 
languages.  The  second  type  of  problem  -  performance  -  must 
also  be  overcome.  In  the  same  way  as  many  DB  experts  still 
reject  relational  systems  for  poor  performance,  deductive  DB’s 
are  supposed  to  be  impractical.  There  is  no  reason  why  this 
should  be  true;  rather  one  should  not  expect  to  get  a  whole 
set  of  new  features  without  extra  cost.  Clearly  what  is  to  be 
decided  is  whether  these  new  features  are  worth  the  overhead, 
as  in  the  case  of  the  relational  database  systems.  Such  a  deci¬ 
sion  will  obviously  depend  on  much  needed  experiments,  but 
one  is  faced  with  the  fact  that  either  these  will  be  successful  or 
a  whole  set  of  new  markets  of  DB  will  be  unfeasible.  Finally, 
there  are  a  number  of  research  problems;  will  DDB  provide 
enough  capabilities  for  e.g.  integrity  constraint  checking  where 
it  is  known  that  the  kind  of  logic  needed  goes,  on  practical 
grounds,  beyond  what  is  probably  sufficient  for  purely  deduc¬ 
tive  capabilities!  Similarly,  will  deductive  capabilities  cater  to 
technical  issues  which  have  no  known  answer  in  conventional 
DB’s.  such  as  correctly  handling  incomplete  and  unknown 
data,  but  which  are  all  the  more  difficult  in  a  deductive  con¬ 
text!  Will  the  theory  provide  solutions  to  a  whole  new  set  of 
problems  brought  up  by  introducing  reasoning  capabilities  to 
DB  systems,  such  as  questions  about  non-monotonic  reasoning, 
hypothesis  and  belief  revisions,  etc...! 

LDB's  share  with  DDB’s  all  these  capabilities.  But  they 
can  help  solve  another  class  of  problems  which  is  important  in 
DB  circles:  what  after  relational  models?  Semantic  data 
models,  which  parallel  knowledge  representation  languages  in 
Al  are  advocated  by  many.  LDB's  offer  a  rigorous  semantics 
for  such  data  models  and  others.  This  capability  is  attained 
by  using  a  logic  programming  language  as  a  specification 
language  while  its  execution  mechanisms  provide  the  deductive 
capabilities. 

There  is  definitely  a  future  for  logic  database  systems: 
for  tbe  user,  for  the  DB  administrator.  Logic  programming  is 
the  key  to  it.  Some  references  which  are  representative  of  the 
point  of  view  of  this  position  paper  are  [Gallaire  and  Minker 
78.  Gallaire  et  al.  S3.  Kowalski  81,  Reiter  83] . 
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The  next  generation  of  intelligent  database  interfaces  will 
act  as  partners  in  the  problem  solving  process.  The  right 
metaphor  for  them  is  the  skilled  research  librarian  or  informa¬ 
tion  retrieval  specialist.  Producing  interfaces  with  the  right 
capabilities  requires  basic  research  in  the  representation,  use, 
acquisition,  and  coordination  of  basic  facts,  general  rules,  and 
metaknowledge  within  the  same  information  system. 

Skilled  information  retrieval  specialists  possess  3  host  of 
special  skills.  They  have  a  general  knowledge  of  particular 
fields,  at  the  level  of  classifying  and  relating  their  subfields  and 
knowing  the  main  contributors.  They  have  precise  knowledge 
about  information  resources  -  what  is  in  them,  how  they  are 
organized,  what  research  aids  are  available.  They  have  stra¬ 
tegies  of  searching  for  answers.  And  they  can  evaluate  the 
quality  of  the  information  they  find  based  on  its  source  and 
their  general  knowledge  of  the  field. 

When  presented  with  a  problem,  the  information  special¬ 
ist  determines  whether  his  resources  are  likely  to  provide  an 
answer.  It  may  be  necessary  to  refine  the  problem  statement 
so  that  useful  information  can  be  provided.  The  specialist 
devises  a  search  plan,  then  forms  questions  precisely  tailored  to 
the  requirements  of  his  information  resources.  Intermediate 
results  may  cause  him  to  revise  the  search  plan.  When  the 
search  is  complete,  the  specialist  may  qualify  the  answer  to 
account  for  the  accuracy  or  currency  of  his  sources.  He  can 
also  answer  questions  about  the  information  resources  them¬ 
selves  so  that  the  user  can  get  a  better  idea  of  what  kinds  of 
problems  the  specialist  c3n  handle.  An  intelligent  database 
interface  with  these  capabilities  is  more  than  an  interface;  it  is 
a  broker  and  manager  of  information.  Current  artificial  intelli¬ 
gence  research  bears  on  each  of  these  tasks. 


Formulating  Questions 

Natural  language  research  aims  to  increase  linguistic  cov¬ 
erage,  and  there  has  obviously  been  much  progress  along  this 
line.  Still,  we  are  far  from  guaranteeing  that  all  NL  input  is 
properly  understood.  There  are  two  interim  solutions.  The 
first  is  to  clarify  the  question  by  interrogating  the  user.  The 
RENDEZVOUS  system  [Codd  78]  does  this.  RENDEZVOUS’ 
dialogues  can  be  cumbersome,  largely  because  clarification  is 
carried  out  in  terms  of  underlying  database  concepts;  still, 
RENDEZVOUS  achieves  zero  error  rate.  (The  IFAP  system 
(McLeod  78]  is  another  clarification  interface,  but  for  a  formal 
query  language;  its  clarification  dialogues  are  also  cumbersome 
and  based  on  database  concepts).  The  second  interim  solution 
is  to  constrain  NL  input  so  the  question  is  sure  to  be  under¬ 
stood.  Texas  Instruments  is  developing  a  system  in  which  the 
user  builds  3n  English  sentence  by  selecting  from  menus  of 
acceptable  continuations  [Tennant  83].  Choices  are  expressed 
in  terms  of  sentence  words,  not  underlying  database  concepts. 
Selection  using  a  pointing  device  also  reduces  typing  to  a 
minimum.  Users  can't  freely  compose  3  sentence,  but  can  usu¬ 
ally  express  what  they  want. 
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Tailoring  Question a  to  Information  Resources 

Tailoring  the  question  to  the  available  information 
resources  means  producing  a  better  question  than  the  user  can 
bv  himself:  it  is  an  'added  value'  function  of  interfaces.  The 
interface  must  take  into  account  general  rules  about  the  data¬ 
base  domain.  Hafner  [Hafner  T9)  uses  a  semantic  network  of 
concepts  from  an  are3  of  the  law  to  add  useful  qualifications  to 
queries  against  a  full-text  retrieval  system  of  legal  documents. 
The  IR-NLl  system  [Guida  83)  does  this  also  for  a  different 
full-text  system.  The  QUIST  system  (King  81]  changes  a 
query  into  an  equivalent  one  that  C3n  be  processed  more 
efficiently;  it  uses  domain  knowledge  to  transform  the  query, 
and  knowledge  of  database  structure  and  access  methods  to 
tell  whether  a  transformation  will  be  beneficial.  Several  sys¬ 
tems  (e.g.  NLP/DBAP  [Ginsparg  83),  TEAM  [Grosz  83))  are 
exploring  ways  to  use  inference  to  fill  in  necessary  information 
the  user  omits. 


Devising  Retrieval  Strategies 

Figuring  out  how  to  answer  a  question  is  essentially  a 
planning  process.  Furukawa  [Furukawa  T9]  posed  retrieval 
from  deductive  databases  as  a  problem  of  planning  a  search 
strategy.  The  IR-NLI  system  mentioned  above  is  consciously 
modelled  on  the  role  of  the  information  retrieval  specialist,  in 
the  realm  of  full-text  retrieval  using  key  word  descriptors.  The 
aim  is  to  produce  an  expert  system  whose  expertise  is  a  model 
of  retrieval  strategies  [Bates  T9).  After  helping  to  formulate 
the  query,  IR-NLI  is  designed  to  revise  it  in  light  of  intermedi¬ 
ate  results  and  produce  a  final  search  strategy  that  satisfies 
various  criteria  imposed  by  the  user.  Ideally,  an  intelligent 
database  interface  should  produce  an  "opportunistic  plan” 
[Hayes-Roch  T8|  that  augments  the  basic  retrieval  plan  by  get¬ 
ting  other  information  of  use  to  the  user,  or,  if  it  determines 
the  question  can’t  be  answered  precisely,  answers  a  different 
question  that  may  nevertheless  be  helpful.  Current  work  in 
planning  utterances  [Appelt  82]  and  communicating  between 
cooperative  agens  [Konolige  80)  is  relevant  to  this. 

Explaining  and  Qualifying  Answers 

Some  intelligent  database  interfaces  have  been  designed 
to  correct  faulty  presuppositions  when  a  question  fails  ([Janas 
79),  [Kaplan  79|).  This  does  not  require  a  knowledge  of  gen¬ 
eral  domain  rules,  but  just  an  analysis  of  the  subparts  of  the 
query  against  ground  facts  in  the  database.  Current  research 
[Webber  82]  concerns  when  and  how  to  offer  explanations  with 
answers  in  general,  not  just  when  the  answer  is  empty,  and 
also  when  answers  are  produced  by  a  combination  of  ground 
facts  and  general  domain  rules. 

Responding  to  General  Questions 

The  inclusion  of  general  domain  rules  and  metaknowledge 
(knowledge  about  the  general  contents  and  structure  of  the 
database)  makes  it  possible  to  respond  to  general  questions. 
McKeown  [McKeowu  80)  uses  an  augmented  database  schema 
to  support  natural  language  questions  about  the  structure  of 


the  database.  Konolige  has  proposed  [Konolige  81]  represent¬ 
ing  the  mapping  between  real-world  concepts  and  database 
contents  formally  as  a  metalanguage  and  a  database  language 
which  is  its  intended  model.  Database  queries  are  then  for¬ 
mally  derived  from  metalanguage  questions  stated  in  terms  of 
real-world  concepts.  It  is  also  possible  to  derive  whether  a 
particular  kind  of  database  query  can  be  asked,  thus  identify¬ 
ing  areas  where  the  database  is  incomplete.  Work  on  FOL 
[Weyhrauch  80)  and  SPHERE  [Filman  83)  is  along  these  lines 
as  well.  To  respond  to  general  questions,  it  is  also  essential  to 
extend  the  linguistic  analysis  so  that  it  is  not  assumed  that  a 
question  translates  directly  into  a  database  query.  Work  on 
KLAUS  [Hendrix  80]  is  taking  up  this  question. 

Integrating  Multiple  Information  Resources 

Several  systems  are  now  under  development  to  permit  a 
single  interface  to  reference  several  information  resources  and 
to  control  the  operation  of  related  programs,  including  expert 
systems  [Weiss  82).  Portability  of  natural  language  interfaces 
is  being  extended  to  include  nonformatted  databases  [Thomp¬ 
son  83).  The  KLAUS  project  is  also  explicitly  concerned  with 
tying  together  multiple  information  resources  with  mixed 
types  of  data. 

These  capabilities,  combined  with  advances  in  natural 
language  understanding,  will  constitute  a  large  step  forward 
for  the  next  generation  of  intelligent  database  interfaces.  It  is 
only  fair  to  3dd  that  we  are  a  long  way  from  attaining  them, 
because  we  must  confront  very  hard  problems  in  the  represen¬ 
tation  and  use  of  knowledge. 


KNOWLEDGE  REPRESENTATION  AND 
DATABASES 

John  Mylopoulos 
Department  of  Computer  Science 
University  of  Toronto 
Toronto,  Ontario 


I’d  like  to  draw  a  sketch  of  what  "AI  and  Databases" 
means  to  me  and  then  concentrate  on  some  of  the  issues  we 
have  been  addressing  with  our  work  at  the  University  of 
Toronto. 

Knowledge  bases  vs.  Databases 

A  knowledge  base  contains  knowledge  about  something 
represented  in  a  suitable  notation.  A  database  stores  (large 
amounts  of)  data.  In  the  case  of  databases,  there  is  no  com¬ 
mitment  to  an  interpretation  of  the  information  stored  in  the 
database;  for  knowledge  bases,  on  the  other  hand,  there 
definitely  is  such  a  commitment.  Thus  a  relational  database 
stores  tuples  and  one  can  ask  "Is  there  a  tuple  x  such  that 
Q(x)?"  Moreover,  the  answer  to  this  question  can  only  be 
"yes'  or  'no'.  With  a  knowledge  base,  however,  we  are  stor¬ 
ing  information  about  people,  accounts,  etc.  and  a  query  takes 
the  form  'Is  there  an  employee  x  such  that  P(x)?".  Now  the 
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answers  can  be  "yes’,  "no'  or  'I  don't  know’  (and  there  are 
other  possibilities  too).  The  distinction  is  actually  quite  clearly 
reflected  in  the  terms  that  have  been  used  for  the  notations  we 
have  developed  in  order  to  construct,  manipulate  and  search 
databases  and  knowledge  bases:  a  Data  Model  (such  as  the 
relational)  offers  a  set  of  data  structures  and  associated  opera¬ 
tions  for  the  construction  and  manipulation  of  a  database.  A 
Knowledge  Representation  scheme  or  Conceptual  Model  offers 
tools  for  the  representation  and  organization  of  knowledge. 
The  first  commits  its  user  to  a  particular  way  of  visualizing 
what  is  stored  in  a  machine.  The  second  to  a  particular  way  of 
viewing  the  world.  The  so-called  "semantic  data  models"  seem 
to  offer  the  promise  of  a  dual  commitment.  1  consider  this 
approach  to  the  introduction  of  semantics  wrong.  Rather,  I 
prefer  the  strategy  of  having  a  knowledge  base  at  one  level 
(the  conceptual  level)  and  a  database  at  a  different  level  (the 
logical  level). 

The  distinction  I  am  drawing  here  is  by  no  means  abso¬ 
lute.  For  example,  proposals  for  the  treatment  of  "null 
values"  (e.g.  [Vassiliou  81)),  extensions  of  the  Relational  Model 
(e  g.  (Codd  79]),  or  revisions  (e.g.  (Reiter  83])  do  lead  to  ver¬ 
sions  of  the  Relational  Model  that  make  some  semantic  com¬ 
mitments. 

Technology  Transfer:  From  Al  to  Databases 

There  are  three  main  ways  I  can  see  Al  contributing  to 
Database  technology: 

1.  Increasing  the  functionality  of  3  DBMS  -  with  natural 
language  front  ends,  deductive  question-answering,  con¬ 
sistency  checking,  planning  and  the  like;  clearly  this  type 
of  technology  transfer  is  of  importance  to  the  DBMS  user. 

2.  Using  expert  system  technology  to  help  the  database 
builder  -  by  offering  expert  systems  which  are  fed  informa¬ 
tion  on  the  characteristics  of  a  database  and  generate 
automatically  or  semi-automatically  a  database;  here  the 
beneficiary  is  the  database  builder  who  gets  an  apprentice 
(iu  the  sense  used  by  [Rich  81]). 

3.  Using  Knowledge  Representation  work  to  construct  Con¬ 
ceptual  (rather  than  Data)  Models  -  now  the  designer  is 
helped  in  terms  of  notational  tools  so  that  he  can  first 
describe  a  knowledge  base  and  then  proceed  with  the  con¬ 
struction  of  one  or  more  databases  that  implement  the 
knowledge  base. 

Technology  Transfer:  The  Other  Way 

So  "our"  knowledge  bases  are  much  better  suited  for 
talking  about  the  world  while  "their"  databases  are  clearly 
more  implementable.  Is  there  any  reason  for  worrying  about 
implememabiliiy?  I  think  there  is,  and  it’s  a  pragmatic  one. 
We  have  a  technology  that  allows  us  to  build  knowledge  bases 
whose  size  is.  say,  order  IK.  We  need  a  technology  that  will 
allow  us  to  build  knowledge  bases  whose  size  is  order  100K. 
One  (but  not  the  only)  obstacle  in  achieving  this  goal  is  perfor¬ 
mance.  Database  research  has  developed  tools  to  deal  pre¬ 
cisely  with  this  issue,  and  it  seems  natural  that  we  should  turn 
to  them  for  inspiration.  I  should  add  here  that  "turning  to 
them"  doesn't  necessarily  mean  adapting,  say,  relational  or 


DBTG  technology.  It  may  well  happen  that  these  technologies 
are  obsolete,  due  to  advances  in  hardware  and  communications 
research.  But  the  research  area  that  created  these  technologies 
seems  to  have  3n  important  role  to  play  in  dealing  with 
performance  issues  for  any  new  technology  that  tries  to  replace 
them. 

Apart  from  performance  considerations,  there  are  other 
features  of  database  systems  such  as  error  recovery  and  con¬ 
currency  which  are  essential  for  the  construction  of  robust 
knowledge  base  systems. 

Knowledge  Representation  and  Databases 

It  follows  from  this  discussion  that  Knowledge  Represen¬ 
tation  has  a  dual  role  to  play  in  Database  applications: 

(a)  a  direct  role  in  providing  ideas  for  the  design  of  conceptual 

models,  and 

(b)  an  indirect  role  in  providing  tools  and  methodologies  for 

the  construction  of  expert  systems  that  aid  the  designer 

and/or  user  of  a  database. 

Much  of  our  work  at  the  University  of  Toronto  has  con¬ 
centrated  on  the  first  role  and  overviews  of  our  results  appear 
in  [Brachman  and  Smith  80],  [Mylopoulos  81],  [King  83]. 

Some  Research  Issues 

Al,  or  more  specifically  Knowledge  Representation,  appli¬ 
cations  to  Databases  has  raised  fundamental  research  problems 
which  must  be  tackled  before  there  can  be  progress.  Let  me 
mention  three  such  problems: 

1.  Representation  of  Social  Knowledge 

Much  of  the  knowledge  that  is  relevant  to  a  large  class  of 
databases  is  social  in  the  sense  that  it  involves  rules,  regula¬ 
tions  and  procedures  for  the  operation  of  a  social  structure, 
e.g.  a  hospital,  an  office,  a  corporation.  Such  knowledge  is 
fundamentally  different  from  either  (mathematically)  logical 
knowledge  or  common  sense  knowledge. 

Consider,  for  example,  a  fact  such  as  "Incoming  patients 
must  present  their  Hospital  Insurance  Number"  which  may  be 
associated  with  a  hospital  admission  system.  Clearly,  we  do 
not  want  to  treat  its  enforcement  in  the  same  way  we  would 
treat,  say,  "Every  natural  number  has  a  successor"  or  the  way 
we  would  treat  the  enforcement  of,  say,  "Unless  you  know  oth¬ 
erwise,  an  elephant  is  grey". 

Designing  Knowledge  Representation  schemes  which 
enforce  social  regulations  such  as  the  one  above  but  in  a  flexi¬ 
ble  way  appears  to  be  a  difficult  and  important  research  ques¬ 
tion. 

2.  Relativism  of  Representation 

The  view  that  for  any  problem  domain  we  can  construct 
a  knowledge  base  that  presents  a  global  (and  consistent) 
viewpoint  is  wrong.  Consider  a  large  organization  for  which 
we  are  trying  to  build  a  knowledge  base  that  describes  its 
operations.  It  is  possible,  indeed  likely,  that  each  department 
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has  its  own  view  of  what  it  and  other  departments  are  doing. 
Moreover,  this  view  may  be  contradictory  to  the  views  of  other 
departments.  Our  knowledge  representation  tools  must  allow 
us  to  construct  a  knowledge  base  consisting  of  a  collection  of 
interrelated  and  possibly  contradictory  views. 

3.  Compilation  of  Knowledge  Bases 

Even  small  database  applications  require  hundreds  of 
thousands  of  facts  to  be  stored,  accessed  and  manipulated.  To 
build  knowledge  bases  of  comparable  size,  we  must  deal  with 
performance  issues,  such  as  the  compilation  of  a  knowledge 
base  into  a  database  along  with  applications  programs  (e.g. 
[Nixon  83)).  incremental  compilation  and  efficient  consistency 
checking,  (Smith  81]. 


PRAGMATICS  AND  DATABASE  QUESTION 
ANSWERING 

Bonnie  Lynn  Webber 

Department  of  Computer  &  Information  Science 
University  of  Pennsylvania 
Philadelphia,  PA  19104 
U.S.A. 


Artificial  Intelligence  research  is  fortunate  to  have  data¬ 
base  systems  around:  otherwise,  it  would  have  had  to  invent 
them  (and  probably  not  do  as  good  a  job).  This  is  not  such  an 
outrageous  claim:  database  systems  provide  an  ideal  environ¬ 
ment  for  investigating  purely  Al-type  questions  (formal  reason¬ 
ing,  common  sense  reasoning,  Natural  Language  (NL)  nse, 
including  question  answering  and  answer  justification  and 
explanation.  NL  understanding,  etc.),  independent  of  worldly 
domain.  On  the  other  hand,  Al  research  in  the  context  of 
database  systems  has  already  benefitted  those  systems  in 
several  ways,  one  of  which  is  by  making  possible  Natural 
Language  input.  This  makes  such  systems  more  accessible  to 
naive  or  infrequent  users  by  allowing  them  to  state  their 
requests  as  they  would  everyday. 

Al  can  now  benefit  database  systems  in  another  way  -  by 
making  possible  system  output  that  follows  everyday  conversa¬ 
tional  principles  and  practices.  In  this  way,  a  user’s  normal 
expectations  about  responses  to  his/her  utterances  and  normal 
strategies  for  interpreting  those  responses  will  not  fail  him/her, 
when  those  responses  come  from  a  machine. 

Since  this  is  only  a  brief  position  paper,  I  will  try  to  make 
my  point  by  means  of  a  few  short  examples.  Note  that 
Satural  Language  is  not  the  primary  issue:  misleading  or 
unexpected  responses,  even  In  the  King’s  English,  would  be 
just  as  bad.  What  is  at  issue  is  standard  conversational  prin¬ 
ciples  and  practices.  My  point  is  that  Al  can  provide  the 
means  to  enable  database  systems  (and  know  ledge- based  sys¬ 
tems,  in  general)  to  follow  such  principles  and  practices  and 
hence  participate  in  more  effective  and  satisfying  interactions 
with  their  users. 

(In  each  of  these  examples,  the  first  system  response  is  of 
the  kind  one  might  see  today:  the  second  fequallg  correct) 
response  is  as  it  could  be,  were  normal  conversational  practices 
and  principles  being  followed.) 


Example  1 

U:  Which  department  managers  earn  over  $40K  per  year? 

SI:  Abel,  Baker,  Charley,  Dogg 
S2:  All  of  them. 

By  enumerating  managers  who  do  earn  over  J40K  per  year, 
the  first  response  implicates  that  there  are  managers  who  don't 
earn  that  much.  In  linguistic  pragmatics,  this  is  called  a 
'scalar  implicature* :  it  follows  from  the  assumption  that  a 
cooperative  respondent  will  say  as  much  as  s/he  can.  In  this 
case,  s/he  would  only  enumerate  positive  instances  if  s/he 
couldn't  say  the  more  inclusive  'All  of  them'.  Thus  SI  falsely 
misleads  a  user,  who  assumes  the  system  would  say  S2  if  it 
were  true.  What  Al  can  do  here  is  to  provide  tools  for  (1) 
computing  when  such  implicatures  are  in  force  (a  peculiar 
thing  about  implicatures  being  their  potential  cancellability 
[Gazdar  79])  and  (2)  reshaping  responses  to  accord  with  them. 

Example  2 

U:  Where  does  the  Ubu  Roi  stop  on  her  current  tour? 

Si:  Bombay,  Cape  Town,  Honolulu,  Lisbon,  Los  Angeles, 
Nantes.  New  York,  Sidney. 

S2:  New  York,  Nantes,  Lisbon,  Cape  Town,  Bombay,  Sidney, 
Honolulu,  Los  Angeles. 

Both  responses  Si  and  S2  convey  that  the  cities  noted  consti¬ 
tute  the  stops  on  the  Ubu  Roi’s  current  tour.  However,  an 
enumerated  response  to  a  question  such  as  U’s  implicates  that 
the  enumeration  corresponds  to  an  expected  ordering  -  in  this 
case,  temporal  order  -  first  stop,  second  stop,  etc.  Both  SI  and 
S2  will  be  interpreted  in  this  way  (SI,  incorrectly  so).  What 
Al  can  do  is  specify  such  expectations,  so  that  responses  like 
S2  are  produced  if  possible  (i.e.,  if  the  system  can  determine 
the  order  of  stops)  and  if  not,  an  explicit  denial  is  made  that 
the  user's  expectations  are  not  being  met  -  e.g., 

SI’:  The  stops  are  (not  necessarily  in  this  order):  Bombay, 
Cape  Town,  Honolulu,  Lisbon,  Los  Angeles,  Nantes,  New 
York.  Sidney. 

The  two  examples  above  show  systems  giving  uninten¬ 
tionally  misleading  responses  by  ignoring  their  standard  impli¬ 
catures.  There  are  also  cases  where  a  system’s  responses  are 
misleading  because  they  do  not  disabuse  the  user  of  miscon¬ 
ceptions  s/he  seems  to  have  about  the  domain  or  its  encoding 
in  the  databases.  This  may  lead  to  the  user’s  drawing  false 
conclusions  from  the  system’s  repsonse.  This  is  what  1  meant 
earlier  by  the  user  getting  an  unexpected  response:  in  coopera¬ 
tive  human-human  interactions,  the  other  party  normally 
takes  responsibility  for  helping  wherever  possible,  ideally  in  a 
way  that  does  not  cause  loss  of  face. 

The  most  common  illustration  of  this  type  of  misleading 
behaviour  comes  from  Kaplan's  work  in  this  area  [Kaplan  81]. 

Example  3 

U:  Which  Economics  students  got  an  F  in  CIS531  in  Fall  '32? 
S:  None. 

Someone  asking  such  a  question  must  believe,  inter  alia,  that 
there  were  Economics  students  and  that  CIS531  was  given  in 
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Fall  '82.  The  system's  simple  response  "None"  may  lead  such 
a  person  to  conclude  that  all  Economics  students  got  grades 
other  than  F  -  i.e.,  that  they  all  passed.  If  "None"  in  truth 
follows  from  the  fact  that  there  were  no  such  students  or  the 
course  wasn't  given,  then  the  system  has  misled  the  user. 

There  are  many  types  of  misconceptions  a  user  may  have 
-  that  a  relation  can  hold  between  certain  entities  when  it 
can't  (e.g.,  "Which  chickens  teach  CIS531?"  presumes  that 
chickens  can  teach  graduate  courses),  that  an  object  has  a  par¬ 
ticular  attribute  when  it  doesn't  (e.g.,  "What’s  the  maximum 
age  for  opening  a  Money  Market  Account?"),  etc.  [Mays  80, 
Webber  83).  Not  all  such  misconceptions  can  be  detected,  not 
all  can  be  easily  corrected,  and  not  all  make  a  difference  to  the 
system’s  ability  to  convey  its  information  responsibly  [Joshi 
82].  What  I  believe  Al  can  contribute  here  (and  what  several 
of  us  at  Penn  are  working  to  do)  is  to  improve  a  system's  abil¬ 
ity  to  detect  and  correct  misconceptions  detrimental  to  the 
successful  tranfer  of  information  from  the  database  to  the 
user,  thereby  increasing  the  system’s  potential  for  doing  well 
by  its  users. 

One  final  note:  the  examples  above  presume  no  more  of 
a  database  system  than  standard  retrieval  capabilities.  How¬ 
ever-  as  database  systems  begin  to  take  advantage  of  AI's 
deductive  capabilities  in  order  to  provide  answers  to  a  wider 
range  of  questions,  the  issue  of  adhering  to  standard  conversa¬ 
tional  principles  and  practices  becomes  even  more  important. 
Though  I  haven't  the  space  here  to  elaborate  upon  this,  the 
panel  I  hope  will  provide  an  opportunity  to  do  so. 
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An  Informations  Systems  stores,  retrieves  and  manipulates  information  about 
some  portion  of  the  real  world.  In  an  important  sense,  it  can  be  viewed  as  a 
model  of  that  domain  of  discourse.  Recently,  researchers  have  developed 
languages  which  make  it  easier  to  capture  the  semantics  of  the  real  world  in  the 
computerized  system  —  i.e.,  to  model  it  more  faithfully.  The  purpose  of  this 
paper  is  to  introduce  the  reader,  who  is  assumed  to  be  familiar  with  traditional 
data  processing  techniques,  to  the  basic  ideas  underlying  these  languages. 

We  present  in  this  review  a  list  of  general  features  which  characterize  such 
languages,  and  distinguish  them  from  the  traditional  approaches  to  developing 
database-intensive  applications.  We  also  consider  the  motivation  for  these 
features,  the  advantages  of  using  them,  and  the  methodologies  and  tools  for 
Information  System  development  associated  with  them.  These  ideas  are 
illustrated  with  examples  in  four  programming  languages  which  incorporate 
semantic  modeling  facilities. 
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1.  Introduction 

The  term  "Information  System"  (and  the  acronym  IS)  is  used  in  this  paper  to  refer  to  any 
computer  system  for  maintaining  and  accessing  a  pool  of  information  on  some  aspect  of  a 
"real  world"  domain.  Usually,  such  systems  have  as  a  central  component  a  database  of 
facts,  which  is  then  interogated  by  users  through  queries  and  manipulated  by  so-called 
"application  programs".  We  follow  tradition  here  by  restricting  our  attention  to  "formatted 
data”,  in  contradistinction  with  free  natural  language  text,  pictures  or  continuous  signals. 

The  fundamental,  though  often  unstated,  tenet  of  the  work  which  we  will  consider  is  that 
in  an  Information  System 

the  symbol  system  in  the  computer  is  a  mode /  of  the  real  world] 

Models  are  familiar  to  us  from  domains  such  as  engineering  (e.g.,  a  model  of  a  bridge)  and 
science  (e.g.,  formulas  for  the  laws  of  motion).  Often,  their  role  is  to  let  us  answer 
questions  about  the  world  more  conveniently  (quickly,  easily)  than  by  going  into  the  real 
world  and  actually  performing  measurements.  This  is  accomplished  by  chosing  to  ignore 
some  aspects  of  the  real  world,  which  are  not  considered  relevant  Information  systems 
perform  their  task  in  the  same  way:  we  can  use  a  library  IS  to  check  if  a  book  is  taken 
out  without  going  to  the  stacks,  but  we  probably  can't  find  out  from  it  the  colour  of  the 
book's  cover. 

The  above  principle  is  important  because  it  suggests  that  IS  development  is  like  model 
building,  and  that  using  an  IS  is  like  querying  the  model.  Of  course,  some  models  may  be 
more  perspicuous  than  others!  In  this  paper  we  wish  to  survey  several  languages  which 
claim  to  allow  the  description  of  ISs  that  model  the  real-world  enterprise  more  naturally 
and  directly  than  has  been  the  case  traditionally.  The  goal  of  this  approach  is  to  make  it 
easier  to 

•  design  and  maintain  the  IS.  by  chosing  a  vocabulary  which  is  more 
appropriate  for  the  problem  domain,  and  by  structuring  the  IS  description  as 
well  as  the  description  process; 

•  use  the  IS,  by  helping  the  user  find  the  data  stored,  and  interpret  it  to  obtain 
information. 

Information  systems  actually  deal  with  the  concepts  in  terms  of  which  we  think  of  the 
world:  they  are  conceptual  models.  In  order  to  be  useful  though,  ISs  should  model  the 
user's  conceptualization  of  the  application  domain,  not  the  designer's  independent 
perception,  and  certainly  not  the  way  data  is  stored  in  the  computer.  What  appears  in  a 
conceptual  model  of  some  universe  of  discourse?  According  to  a  rather  naive  philosophy, 
the  mental  world  is  populated  by  conceptual  objects  or  entities,  which  have  associated 
descriptions,  and  are  inter-related  in  meaningful  ways.  Furthermore,  activities  occur  over 
time,  with  resulting  changes  in  inter-relations.  All  of  these  are  subject  to  constraints,  (e.g.,  a 
book  can  only  be  borrowed  by  one  person  at  a  time),  which  define  the  concepts  and 


This  is  not  a  new  observation,  and  discussions  of  it  appear  in  the  works  of  Abrial  [Abrial  74], 
Biller  [Biller  78]  and  Tsichtritzis  [Tsichritzis  &  Lochovsky  81],  among  many  others. 
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distinguish  'reality"  from  other  possible  worlds. 

The  remainder  of  the  paper  is  organized  as  follows:  We  first  summarize  some  problems 
with  traditional  IS  development  languages  in  Section  2.  Then,  in  Sections  3  and  4,  we 
present  some  of  the  facilities  for  modeling  the  static  and  dynamic  aspects  of  an-  enterprise 
which  distinguish  Conceptual  Modeling  Languages  (CMLs  henceforth).  Finally,  we  consider 
some  aspects  of  novel  methodologies  for  IS  development  which  are  based  on  the  use  of 
CMLs.  as  well  as  computer  tools  supporting  them. 

There  is  currently  an  extensive  and  growing  literature  on  "semantic  data  models";  such 
work  is  often  concerned  only  with  modeling  the  information  to  be  stored  in  the  database. 
Because  we  cannot  hope  to  give  appropriate  attention  to  all  of  them  here,  because  their 
semantics  are  notoriously  vague,  and  because  an  IS  is  more  than  just  a  database,  we  have 
chosen  to  explicitly  consider  here  only  those  CMs  which  have  been  incorporated  into 
programmi  ng  languages.  The  specific  languages  to  be  discussed  will  be 
ADAPLEX  [Adaplex  83],  (based  on  the  Semantic  Hierarchy  Model  [Smith  &  Smith  77]  and 
Daplex  [Shipman  81]).  DIAL  [Hammer  &  Berkowitz  80]  (based  on  SDM  [McLeod 
78],  [Hammer  &  McLeod  81]),  GALILEO  [Albano  et  al  83],  [Albano  83]  and  TAXIS 

[Mylopoulos  et  al  80],  [Wong  81].  We  will  occasionally  also  refer  to  some  conceptual 
models  which  are  important  for  historic  reasons:  Abrials  Binary  Model  [Abrial  74],  Chen's 
Entity- Re! ationshi p  model  [Chen  76],  and  Codd's  RMIT  extension  to  the  relational 
model  [Codd  79]. 

A  note  on  the  programming  examples  in  this  paper:  In  order  to  make  the  presentation 
more  concrete  we  will  occasionally  give  example  descriptions  in  one  or  the  other  of  the 
CMLs.  However,  it  is  not  our  intention  to  provide  a  detailed  comparison  of  these  languages, 
nor  to  teach  the  reader  their  syntax  or  semantics,  we  wish  to  concentrate  on  the 
significant  ideas  underlying  them.  For  this  reason,  we  will  not  provide  a  line-by-line 
explanation  of  the  program  examples;  the  reader  should  be  able  to  infer  the  approximate 
meaning  of  the  constructs  from  the  key  words  used  and  the  general  principles  expounded. 
The  appendix  at  the  end  of  the  report  contains  a  small  example  which  has  been  worked 
out  in  the  4  programming  languages  mentioned  above  The  purpose  of  the  appendix  is  to 
give  a  more  coherent  picture  of  these  languages  than  the  one  provided  by  the  fragmented 
examples  in  the  main  text 


2,  The  traditional  approach. 

Consider  first  how  information  about  the  objects  and  relationships  in  a  conceptual  model 
of  some  world  appears  in  "traditional"  information  systems.  The  data  base  management 
systems  developed  commercially  in  the  last  decades  have  emphasized  the  efficient 
processing  of  large  quantities  of  data  Based  on  the  nature  of  the  data  structuring  and 
operation  facilities  that  they  offer,  DBMS  are  often  categorized  in  one  of  three  "classical 
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data  models”:  hierarchic,  network  or  relational.  The  notion  of  "record"  —  fixed  sequence 
of  named  field  values  conforming  to  one  of  several  record  schemata  —  underlies  most  of 
the  currently  available  DBMS  in  a  fundamental  way;  the  hierarchic  and  network  approaches 

utilize  in  addition  the  notion  of  link  or  pointer.  One  is  therefore  left  with  the  notions  of 

"record",  "field”,  "co-occurrence  of  fields  in  a  record"  and  "link  between  records"  to 
encode  all  entities  and  inter-relationships  of  semantic  interest;  the  "schema”  or  record  type 
definition  provides  the  restrictions  on  potential  relationships.  So  called  "semantic  integrity 
constraints"  are  needed  to  check  any  further  conditions,  and  these  are  not  available  in  many 
commercial  systems. 

Kent  [Kent  79]  has  articulately  criticized  the  suitability  of  record-based  information 
models  on  the  grounds  that 

•  the  sparsity  of  basic  constructs,  noted  above,  means  that  they  are  needed  to 

represent  a  great  variety  of  relationship  and  entity  types,  and  forces  these 

basic  constructs  to  have  very  weak  semantics;  this  allows  information  to  be 

encoded  in  many  alternative  ways,  fails  to  assist  the  user  in  interpreting  the 
data  appropriately,  and  provides  relatively  little  automatic  cheking  for  semantic 
violations. 

•  although  originally  there  appear  to  be  clear  uniform  patterns  for  information  in 
any  particular  world,  many  "inhomogeneities"  arise  after  further  consideration, 
and  these  lead  to  difficulties  in  designing  proper  record  formats; 

•  records  are  frequently  required  to  have  "key  fields”  for  unique  identification 
by  users,  and  these  are  often  problematic  in  real-life;  this  problem  is 
compounded  by  the  practice  of  stating  relationships  between  the  identifiers  of 
entities,  rather  than  entities  themselves; 

•  in  many,  though  not  all,  record-based  DBMS  the  description  of  the  database  is 
not  accessible  to  users  in  the  same  way  as  the  facts  stored  in  it 

We  will  return  to  some  of  these  points  later  in  the  paper,  when  we  consider  now  CMs 
attempt  to  resolve  such  problems. 

As  for  the  modeling  of  the  "dynamic"  aspects  of  an  enterprise,  this  has  traditionally  been 
accomplished  by  writing  application  programs  in  some  standard  programming  language  (e.g., 
COBOL,  PL/I),  and  having  these  programs  invoke  commands  provided  by  the  data  base 
management  system.  This  approach  itself  has  a  number  of  drawbacks,  including: 

•  except  for  Cobol,  these  are  general  purpose  programming  languages,  which 
have  not  been  designed  with  the  needs  of  IS  modeling  in  mind,  and  hence 
provide  few  features  which  would  make  this  task  easier; 

•  the  data  type  structure  of  the  host  programming  language  is  not  integrated 
with  that  of  the  DBMS,  which  may  force  translation  between  the  run-time 

representation  and  storage  representation;  this  leads  to  inefficiency  and 


The  reader  is  assumed  to  be  familiar  with  these  notions  (see,  for  example.  Date  [Date  81]  or 
Tsichntzis  [Tsichritzis  &  Lochovsky  81]);  also  note  that  this  use  of  the  word  "model"  is  distinct  from  the 
sense  of  modeling  noted  before. 
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inconsistency. 

Experimental  languages  such  as  RIGEL  [Rowe  79],  PASCAL/R  [Schmidt  77],  and 
PLAIN  [Wasserman  et  al  81]  have  attempted  to  integrate  facilities  for  relational  databases 
into  a  more  or  less  traditional  programming  language;  however,  since  their  persistent  data 
types  are  based  on  the  relational  model,  they  are  subject  to  the  problems  noted  earlier. 
PS-Algol  [Atkinson  et  al  82]  allows  all  data  to  be  stored  permanently,  but  it  does  not 
provide  features  specifically  suited  to  the  modeling  of  the  entities  and  activities  associated 
with  Information  Systems. 

In  the  next  two  sections,  we  aim  to  introduce  the  reader  to  the  novel  features  of  the 
CMLs.  We  will  do  so  by  pointing  out  general  principles  which  underly  these  languages,  and 
showing  why  these  help  with  modeling 


3.  Semantic  Modeling  of  Entities  and  Associations 

3.1.  Fundamental  features  of  Semantic  Data  Modeling 

The  following  features  appear  to  common  to  all  CMLs. 

XXXXXXXXXXXXHXXXHXXHXXXXHXXXXXXXXXXXXXXXXXXXXXXXXXXHXHXXXXXX'XHXHMMX 

*  Objects  in  the  model  correspond  to  entities  in  the  world.  * 

XXXMXXXXXMXXXXHXXXXXXXWXX-X-XttttXXXXXXXXXXXXXXXXXXX'XXHXXXXXXXXXHttttHHttX 

One  fundamental  tenet  of  work  in  CM  is  the  need  for  a  natural  one-to-one 
correspondence  between  the  entities  in  the  enterprise  and  the  objects  in  the  model.  This 
means  that  the  same  entity  is  not  represented  by  more  than  one  thing  in  the  model  (as 
would  be  the  case  if  there  is  a  record  representing  a  person  as  a  student,  and  a  separate 
record  representing  the  same  person  as  an  instructor).  Furthermore,  the  same  object  in  the 
model  should  not  represent  more  than  one  entity  (e.g..  a  record  which  contains  information 
describing  both  a  book  and  its  author). 

The  violation  of  this  basic  principle  is  the  source  of  many  problems  in  database  design, 
including  the  "insertion  and  delition  anomalies"  of  the  relational  model  [Date  81].  For 
example,  when  a  book  is  removed,  we  could  lose  all  information  about  its  author,  or  when 
a  person  leaves,  we  may  remove  her  from  the  list  of  instructors  but  forget  to  do  so  from 
the  list  of  students. 

XXXXHXXXHXXXXXXXHXXXXHXXXXXHXXXMXXHXXXXXXXXXXXMXHXMKH-XMttXH 

*  An  object  is  not  the  same  thing  as  a  name  for  it  * 

XXXXXXXXXHXXXXMXXXXXXXXXXXXXXXXXXXXXXXXXXttXXXMHXHXHX-HXXMHH 

In  CMs,  an  object  can  exist  without  having  a  proper  name  —  unique  external  identifier, 
and  yet  be  distinct  from  other  objects  Contrast  this  with  the  situation  in  the  relational 


For  further  details,  consult  the  work  of  Atkinson  [Atkinson  et  al  82],  who  has  considered  in  depth  the 
problems  of  data  persistence. 
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model,  where  an  object,  represented  by  a  record,  cannot  be  stored  without  having  a 
corresponding  visible  "name"  —  the  primary  key.  For  example,  in  an  IS  supporting  a 
library,  the  object  modeling  an  author  may  be  described  by  its  last  name  and  initials.  Usually 
this  is  not  sufficient  to  distinguish  each  author  (there  are  several  W.Scotts);  however,  in 
setting  up  the  database  the  librarian  can  use  his  general  knowledge  to  assign  -appropriate 
author  entities  to  books.  This  means  that  we  can  still  expect  a  correct  answer  to  the 
query  asking  for  all  the  books  written  by  the  author  of  "Waverly":  books  on  Economics, 
written  by  some  other  W.Scott,  will  not  be  returned. 

In  addition,  this  avoids  the  problem  of  exceptions  to  the  rules  of  unchanging  unique 
names  [Kent  79],  [Codd  79].  Such  problems  arise,  for  example,  when  the  proper  name 
is  missing  (newly  arrived  imigrants  may  not  have  Social  Security  numbers),  is  modified  (a 
book  may  be  reclassified,  and  given  a  new  call  number),  or  when  there  are  several  equally 
good  names. 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXKXXXXXXXXXXXXXXXM-XX* 

*  State  relationships  between  entities ,  not  their  names  * 

XXXXXXXXXXXHXXXXXXXXXXXXXXXXXXX  X'X  XXXXXXXXXXXXXXXXXXXXXXX  X  X'X-XX 

Generally,  mathematical  relations  are  used  to  model  the  intei — relationships  of  entities  in 
the  world.  An  important  consequence  of  the  preceding  principle  is  that  relations  are 
asserted  between  objects,  not  their  names  or  descriptions.  Thus,  to  represent  the  fact  that 
Pierre  borrowed  a  particular  copy  of  a  book,  the  Loaned-To  relationship  relates  the 
corresponding  entities  in  the  IS,  rather  than  relating  Pierre's  student  number  and  the  book’s 
call-number.  CMs  also  adopt  some  consistent  rule  which  ensures  that  an  entity  cannot  be 
removed  as  long  as  it  participates  in  some  relationships.  This  resolves  the  problem  of 
"dangling  references",  which  arises  in  record-based  systems  when  the  name  in  a 
relationship  no  longer  refers  to  an  entity  because  the  record  describing  it  has  been  deleted, 
or  its  key  has  changed 

The  above  three  points  are  fundamental  to  all  CIVILs,  and  have  been  present  in  the 
database  context  since  the  seminal  work  of  Abrial  [Abrial  74]  and  Chen  [Chen  76]. 

xxxx-x-xxx-xxxxxxxxxxxxhxxxxxxxxxxmxxxx-xxxxxxxxxxxxxxxxxx-xxxxxxxxxxx 

*  Three  special  semantic  relations:  attribute,  type,  subtype  * 

XXXttXWXXXXXXXXXXXXXXXXXXXXXXXXWXX-XXXXXXXXXXXXXXXXXXXXXXXXXXXX-XXXX 


As  observed  by  McLeod  and  Smith  [Brodie  &  Zilles  81],  CMs  single  out  three  semantic 
relations  for  special  treatment  has- attribute,  has-ty  pet  has- instance,  has- subtype. 

An  attribute  is  a  function  of  the  object  to  which  it  applies,  and  the  collection  of  the 
attributes  of  an  object  together  with  their  values  describe  the  object  For  example,  the 
attributes  of  a  specific  book  in  a  library,  include  a  Title,  whose  value  may  be  "Waverly",  an 
author,  W.Scott,  a  Publisher,  Oxford  Press,  and  a  Call  Number,  PR  5322  W4. 

There  are  usually  two  significant  differences  between  a  library  book  having  as  one  of  its 

attributes  Borrower  say,  and  explicitly  establishing  the  binary  relation  LOANED _ TO  between 

books  and  people:  (i)  in  the  former  case  the  relation  is  usually  one-directional:  from  books 
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to  persons,  while  in  the  second  case  it  is  symmetric,  and  (ii)  only  in  the  second  case  can 
the  relation  have  an  attribute  of  its  own,  specifying  for  example  the  date  when  the  book 
was  borrowed 

A  particular  book,  such  as  Waverly,  is  related  to  the  object  BOOKS,  representing  the 

generic  concept  of  book  or  the  class  of  books,  by  the  instance-of  relationship.  Usually, 

every  object  is  required  to  be  the  instance  of  at  least  one  class.  As  a  result,  objects  in  a 
CML  have  a  type,  and  the  description  of  the  class  imposes  constraints  on  the  attributes 
which  are  applicable  to  an  object  as  well  as  their  values.  Thus,  all  BOOKS  can  have  a  Title, 
which  must  be  a  string,  a  publisher,  which  must  belong  to  the  class  of  known  publishers, 
etc.  In  the  CMLs  considered  here,  one  object  is  allowed  to  be  an  instance  of  more  than 

one  class,  mirroring  the  fact  that  in  the  real  world  entities  have  multiple  roles. 

Finally,  classes  are  themselves  related  among  each  other  by  the  subclass  relationship,  as 

between  SHORT _ TERM _ LOANS  and  BOOKS,  or  COMPUTER_BOOKS  and  BOOKS.  This  is 

particularly  useful  when  there  are  many  classes,  since  the  resulting  hierarchy,  often  referred 
to  as  the  IS-A  hierarchy,  organizes  the  class  descriptions  and  often  eliminates 

unnecessary  duplication:  if  COMPUTER _ BOOKS  form  a  subclass  of  BOOKS,  then  there  is 

no  need  to  repeat  for  COMPUTER _ BOOKS  all  the  things  mentioned  in  the  definition  of 

BOOKS,  since  these  are  automatically  true  of  the  subclass.  This  inheritance  of  attributes 
and  constraints  from  a  class  to  its  specializations  appears  to  be  a  useful  device  to 
abreviate  descriptions. 

Why  do  CMLs  offer  these  particular  semantic  relations  as  primitives?  Firstly,  because  they 
occur  very  frequently  in  modeling  a  world.  Secondly,  because  they  turn  out  to  form  the 
bases  of  important  axes  for  the  organization  of  large  descriptions  (see  section  5).  And 
finally,  because  the  semantics  of  these  relationships  can  then  be  incorporated  into  the 
language  (and  its  compiler  or  interpreter),  so  that  certain  consistency  rules  no  longer  need 
to  be  stated  or  checked  explicitly  by  the  programmer  [Mylopoulos  &  Wong  80],  [Borgida 
&  Wong  81].  For  example,  as  noted  above,  the  definition  of  a  class  constrains  the  values 
of  the  attributes  of  its  instances.  Other  rules  require  that  the  instances  of  a  subclass 
belong  to  all  superclasses  of  that  class,  and  attributes  defined  on  one  class  must  also  be 
defined  for  all  subclasses. 

The  time  has  come  to  clarify  these  ideas  by  an  example.  The  class  definitions  in  Figure 
1  are  part  of  an  IS  supporting  a  library,  and  have  been  given  in  the  language  ADAPLEX.  We 
hope  that  the  preceding  discussion  makes  the  language  constructs  largely  self-explanatory 
and  invite  the  reader  to  find  examples  of  the  principles  noted  above. 
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type  Authors  is 
entity 

first _ name:  STRING!  1.. 20); 

last _ name:  STRING!  1.. 40); 

end  entity; 


type  Books  is 
entity 

call#:  INTEGER; 
title:  STRING!  1..  120); 
publisher  Publishers; 
writenBy:  set  of  Authors; 
end  entity; 

unique  call#  within  Books; 

subtype  Computer _ book  is  Books 

entity 

computing _ reviews _ codes:  set  of  STRING!  1.. 40); 

end  entity; 

subtype  Short _ term _ loan _ books  is  Books 

entity 

for _ course:  Course; 

requested _ by:  Instructors; 

end  entity; 

type  Loans  is 
entity 

item;  Books; 
loanedTo:  Borrowers; 
dueDate:  Dates; 

renewalsLeft  INTEGER  range  0..2; 
end  entity; 


Figure  3-1: 


Class  and  subclass  definitions  in  ADAPLEX 
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Although  the  above  example  does  not  show  it,  a  class  can  be  defined  to  be  the  subclass 
of  two  or  more  classes:  teaching  assistants  are  both  students  and  employees.  Finally,  in 
TAXIS  and  GALILEO  one  can  refine  the  restrictions  on  the  values  of  attributes  when 
defining  a  subclass.  For  example,  the  definition  of  Computer  books  could  have  included 

publisher:  Science _ publishers; 

to  specify  that  these  books  must  be  published  by  Science _ publishers,  a  subclass  of 

Publishers. 


3.2.  Additional  features  of  CMLs 

The  following  features,  though  not  present  in  all  CMLs,  provide  additional  ways  of 
capturing  the  semantics  of  an  enterprise. 

HKHHKHHXHHHHHXMHHXMMHHWHHH*  H  HHH  M  K  M  H  H 

x  C/asses  are  objects  themselves.  * 

KXMHHMK  HHHHMHHMHXHXXHHMXHHXXMHHWHXMKMXH 

Not  only  can  one  attach  definitional  information  to  classes,  but  in  some  cases  (TAXIS, 
SDM)  classes  themselves  are  objects  and  therefore  may  have  their  own  attributes.  This 
provides  a  natural  repository  for  summary  information  (e.g.,  the  number  of  items,  or  the 
length  of  time'  for  which  various  subclasses  of  books  can  be  loaned  out),  and  meta-data 

leg.,  the  Publisher  of  an  instance  of  Computer books  must  belong  to  the  class  of 

Science _ publishers). 

For  example,  in  TAXIS  one  could  define  the  meta-class  Material _ Kinds,  whose  instances 

would  be  various  classes  of  lending  materials,  including  different  classes  of  books. 

metaclass  Material _ kinds  with 

attributes 

loanDuration:  Integers 
numberOfltems.  Integers; 
end  Material _ kinds; 

dataclass  Short _ term _ loan _ books  in  Material _ types  isa  Books  with 

atributes 

requestedBy:  Instructors; 

•  •  • 

end  Short _ term _ loan _ books; 

Short  term _ loan _ books ‘loanDuration  <-  3; 


dataclass  Regular _ loan _ books  in  Material _ kinds  isa  Books; 

Regular _ Joan _ books ‘loanDuration  <-  14; 


Figure  3-2:  Metaclass  and  instances  in  TAXIS 

Based  on  the  definitions  in  Figure  2,  the  expression  Regular _ loan _ books ‘loanDuration 

would  then  evaluate  to  14,  presumably  indicating  that  books  of  this  kind  can  be  taken  out 
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for  14  days.  Note  that  this  is  the  same  syntax  as  for  evaluating  the  attributes  of  ordinary 

objects:  if  x  belongs  to  Short _ term _ loan _ books,  then  x*requestedBy  yields  the  person 

who  requested  that  this  book  be  kept  on  restricted  loan.  Furthermore,  in  TAXIS  the 

expression  Short _ term _ loan _ books* -requestedBy  will  have  as  value  the  class-object 

Instructors.  This  allows  us  to  access  uniformly  both  the  description  of  an  TS.  and  the 
facts  currently  stored  in  it. 

MXXXXXXXXX  XXX  XX  XX  XX  X  XX  X  X  X  X  WXXXMOHM  HKMMKH 

*  Attributes  can  have  multiple  values  * 

X  X  X  X  X  X'X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X-X  XXXXXXXXXXXXXX 

Many  CMLs  recognize  the  utility  of  multi-valued  attributes  (e.g.,  a  book  can  have  several 
authors)  and  allow  attributes  to  have  as  values  sets  or  sequences  of  entities.  In  such  cases, 
the  CMLs  provide  operators  to  create  and  manipulate  sets,  including  built-in  functions  to 
count  and  order  them.  Often,  they  also  provide  conventions  which  make  it  much  easier  to 

express  iterative  procedures.4 

For  example,  in  ADAPLEX  and  DIAL  attributes  are  seen  as  functions,  and  the  expression 
f(x)  is  computed  as  {f (z)  |z  e  x}  if  x  is  a  set,  but  f(x)  otherwise  As  a  result,  using  the 
example  in  Figure  1,  namelpublisher(y))  would  yield  the  name  of  the  (sole)  publisher  of  book 
y,  while  namelwritenBy(y))  would  yield  the  set  of  names  of  the  authors  of  book  y. 

XXXXXXXXXWtt-XXMXXXXXXXXXXXXXXXXXXXXXXXXXXMXXXKXXXMXXXXXXXHKX'XHXX* 

x  Descriptions  contain  redundancy  and  computed  information  * 

XXXXHX-XHXXXXXXXXXXXXHXHXXX-X-X**XX*-XXXtt-XXHXXXXX-XMXXW*X-XX*XXXXXXXXX 

CMLs  allow  us  to  define  derived  information,  such  as  computed  attributes,  or  classes 

whose  instances  are  checked  procedurally  rather  than  explicitly  stored  Redundancy  is 
important  because  it  accommodates  differing  viewpoints  of  users,  and  because  it  makes  it 
more  likely  that  the  design  of  the  IS  is  complete. 

For  example,  even  in  the  presence  of  the  Loans  class,  some  users  may  prefer  to  think 
of  overdue  books  as  a  subclass  of  books,  which  have  new  attributes  such  as  the 
borrower  and  the  fine.  Of  cours,  this  information  can  be  obtained  from  the  Loan  instance 
which  has  as  item  this  particular  book.  In  figures  3  and  4  we  show  how  such  classes 
could  be  defined  in  DIAL  and  GALILEO  respectively. 

Note  that  in  CMLs,  the  syntax  for  dealing  with  derived  attributes  and  classes  is  the  same 
as  for  stored  information,  so  that  the  user  is  not  aware  of  implementation  decisions. 


The  advantages  of  such  an  approach  were  pointed  out  by  Buneman  in  his  work  on  the  Functional  Query 
Language  [Buneman  &  Frankel  79]. 

5 

Such  derived  information  can  be  defined  using  the  view  mechanism  of  certain  relational  systems  such  as 
System  R. 
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class  Overdue _ books  derivation:  Books  with 

CountlLoans  with  item  =  self  and  Before(dueDate,sToday))  >  0 
attributes 

borrower 

derivation:  loanedTo  of  match  to  Loans  on  item 

fine 

derivation:  calculation!) 

SoverdueRate  *  Difference(dueDate.SToday) 

Figure  3-3:  Derived  class  and  attributes  in  DIAL 

Overdue _ books 

restriction  of  Books 

with  some  x  in  Loans  with  this  =  item  of  x  &  BeforeldueDate  of  x,  Stoday) 

class  Overdue _ book  <-> 

(is  Book  and 
borrower  :=  derived 

loanedTo  of  get  Loans  with  item  =  this  . 

and 

overdueFine  :=  derived 

overdueRate  *  difference($today,dueDate  of  get  Loans  with  item  =  this) 


Figure  3-4:  Derived  class  and  attributes  in  Galileo 


MXHHXXXXKXXXXHXXXXXXX-XHXXHXHHHXXXXXX 

*  Stating  integrity  constraints  * 

HHXXXXXXXX-XXXMHXHXHXXXXXXXHXHXXMHXXX 

The  inter-relationships  of  entities  are  subject  to  constraints  in  the  world  Some  of  these 

e 

constraints  can  be  stated  using  special  constructs  in  CMLs. 

•  Ranges  for  attribute  values:  as  we  showed  above,  in  all  CMLs  the  possible  values  of 
an  attribute  are  constrained  by  its  type  definition.  These  "domain  constraints"  go  far 
beyond  the  conditions  on  the  numeric  or  string  figure  of  the  data  which  are 
provided  by  most  traditional  DBMS. 

•  Cardinality  of  relations:  For  example,  in  the  Binary  Model  one  can  express  a  lower 
and  upper  bound  on  the  number  of  entities  which  can  be  related  to  a  single  value  by 

a  relation  R  and  its  inverse  R-1;  thus  Parents  is  a  (2,2)  mapping,  while  its  inverse  is 
(0,®)  indicating  that  a  person  must  have  exactly  2  parents  and  may  have  0  or  more 
children. 

•  Identifying  attributes:  one  or  more  attributes  may  be  marked  as  unique  identifiers,  or 
some  equivalent  term,  to  constrain  each  object  to  have  a  distinct  combination  of 
values  for  these  attributes  (eg.,  unique  assertion  in  Figure  1). 


Facilities  for  expressing  some  such  constraints  also  appear  in  some  traditional  DBMS. 
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•  Optional  vs.  required  attributes,  mutable  vs.  immutable  attributes:  CMLs  usually  allow 
attributes  to  be  constrained  so  that  their  value  is  always  know  (i.e.,  cannot  be  null); 
also,  some  attributes,  once  assigned  a  value  cannot  be  changed  (e.g.,  book  title). 

•  Constraints  on  the  sets  of  instances  of  classes:  since  an  entity  may  belong  to  more 
than  one  class,  ADAPLEX  and  GALILEO  allow  the  designer  to  specify.  explicitly 
whether  two  classes  may  overlap  or  whether  some  group  of  classes  partitions  some 
superclass. 

•  General  integrity  constraints:  In  order  to  state  general  constraints,  ADAPLEX  allows 
general  integrity  constraints  to  be  stated  in  a  powerful  language  which  allows 
quantifiers. 


3.3.  On  the  type  structure  of  CMLs 

In  the  above,  we  have  concentrated  on  the  CML  facilities  for  modeling  entities  and  their 
relationships.  A  brief  look  at  the  general  type  structures  offered  by  these  languages  may 
be  instructive. 

All  languages  offer  primitive  types  such  as  integers,  strings,  booleans  and  enumerations, 
as  well  as  objects  with  attributes.  Except  for  TAXIS,  the  other  CMLs  also  provide  the 
convenience  of  multi-valued  attributes  through  set  or  sequence  expressions. 

Because  they  concentrate  on  data  processing  applications,  neither  TAXIS  nor  DIAL  offer 
other  kinds  of  types,  such  as  arrays. 

ADAPLEX  is  the  result  of  adding  to  ADA  a  version  of  the  semantic  modeling  facilities  of 
DAPLEX  [Shipman  81].  Programmers  can  then  use  the  full  ADA  type  structure  in 
procedures  but  are  restricted  to  store  in  the  database  only  values  of  primitive  or  object 
types,  or  sets  thereof.  This  means  that  ADA  abstract  types  or  arrays  cannot  be  stored 
permanently;  furthermore,  sets  can  only  have  as  elements  persistent  objects.  These 
restrictions  result  from  the  fact  that  ADA  compilers  are  very  hard  to  build:  none  is  yet 
commercially  available.  For  this  reason,  the  implementors  of  ADAPLEX  have  chosen  to 
implement  the  portion  of  ADAPLEX  that  deals  with  object  classes  —  i.e.,  data -modeling, 
and  pass  the  remainder  of  the  ADAPLEX  program  to  the  (future)  ADA  compiler. 

Finally,  GALILEO  has  an  extremely  rich,  yet  uniform  type  structure.  The  basic  types  of 
GALILEO  are  integers,  strings,  booleans  and  objects.  Composite  types  can  be  obtained  by 
the  arbitrary  use  of  cartesian  product,  disjoint  union  and  sequence  operators,  as  well  as 
function  definition.  Among  others,  in  the  resulting  language  functions  are  first-class  citizens 
(eg.,  there  are  functions  of  functions),  and  GALILEO  proposes  that  all  information  be 
uniformly  persistent 


4.  Modeling  dynamic  aspects  of  the  enterprise  in  CMLs 
CMLs  emphasize  the  need  to  model  the  activities  of  the  enterprise  as  an  integral  part  of 
the  conceptual  model,  claiming  that  this  information  is  important  in  the  final  definition  of 
the  IS  [De  Antonellis  &  Zonta  81].  As  with  data,  it  is  important  to  capture  in  the  model 
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the  semantics  of  the  real-world  activities,  and  to  do  so  in  a  direct  and  natural  manner. 

The  CM  programming  languages  offer  primitive  data  manipulations  facilities  such  as  adding 
and  removing  objects  in  classes,  and  setting  and  retrieving  their  attribute  values.  In  addition, 
they  provide  control  structures  for  expressing  conditional  and  iterative  execution  of 
statements,  with  special  concern  for  manipulating  groups  of  objects,  such  as  all  the 
instances  of  a  class.  These  languages  also  offer  the  notion  of  transaction,  a  procedure 
which  is  guaranteed  to  leave  the  system  in  a  consistent  state,  and  facilities  for  raising  and 
handling  exceptions.  These  can  be  used  as  a  mechanism  for  concentrating  on  normal  cases 
in  transaction  development,  leaving  special  abnormal  cases  to  be  treated  in  a  separate  pass. 

Information  about  the  semantics  of  activities  may  appear  in  a  number  of  forms  in  a 
conceptual  model. 

XXXXXXXXXXXXXXXXXXXXX-XXXHHXXXXXHXXXXXX 

«  Describing  events  as  objects  * 

XHXXXXXXXXXXXXX-XttXXHXXXXXXXXXXXXXKXXXX 

In  CMs  such  as  the  original  SDM  [McLeod  78]  or  RM/T  [Codd  79],  objects  can  be 
classified  to  be  of  " event  type”.  In  RM/T,  events  are  those  entities  which  have  associated 
a  time  of  occurrence  or  initiation/termination.  RM/T  then  provides  special  semantic  relations, 
such  as  "must  be  followed  by”  and  "may  be  followed  by  one  of",  between  event  types. 
One  can  now  state  some  semantic  constraints  on  real-world  event  sequences  (e.g.,  lending 
a  book  must  be  followed  by  the  return  of  that  book),  and  have  the  system  check  such 
relations  by  checking  occurence  times,  for  example 

XXXXHKXXX*-XXXXXW»*XXXXXXXX*X*H<-X-H-XX»XXXXXMXXXXfrXXXXHXXXXXXXX 

*  Primitive  operations  on  a  class  can  be  encapsulated  * 

XXHMMXXKXXXXXXXXX-XXXXX-XttXXXXXXXXXKXXXMHXXXXXXHXMXXXXXXXXXXX 

In  ali  languages  one  can  define  procedures  for  manipulating  objects  in  meaningful  ways. 
For  example,  in  making  a  Loan  we  would  have  to  compute  the  appropriate  due  date  and 
set  the  number  of  renewals  allowed,  as  illustrated  in  the  DIAL  procedure  of  figure  5. 

create _ proc  norm.Loan(b:Books,  p:Borrowers,  duration  INTEGER) 

/*  creates  a  new  Loan  object  and  initializes  its  attributes  */ 

[lnitiattr(item,b) 

initiattr(loanedTo,p) 

InitattrfdueDate,  add(duration,$Today)  ) 
lnitattr(renewalsLeft,2)  ] 

Figure  4-1:  DIAL  procedure  for  making  a  Loan 

Such  procedures  can  maintain  effectively  semantic  constraints  that  have  not  been  expressed 
declaratively.  They  are  particularly  valuable  since  they  offer  the  expressive  power  of  the 
programming  language  together  with  efficiency  in  execution,  while  at  the  same  time  being 
transparent  to  the  user. 

However,  just  defining  these  procedure  does  not  prevent  others  from  manipulating  these 
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objects  through  the  primitive  update  operators,  such  as  "insert  c  in  LOANS".  In  the  Binary 
Model,  GALILEO  and  DIAL,  one  can  selectively  hide  the  primitive  updaters  behind  a  list  of 
publicly  available  procedures.  As  a  result,  classes  behave  like  abtract  data  types  in  these 
languages. 

HHXXXXXMHXHHXXXHXMKMXMMMHXWXttXX-XXXXXXXM 

*  Modeling  events  by  transactions  * 

HXXXXHXXXXKXHHXXXXHH++XXHXXXXXXHXXMXXXXX 

Transactions,  as  groups  of  primitive  operations,  are  usually  used  in  CMs  to  model 
complex  activities  in  the  world. 

Some  CMLs  aim  to  make  event  descriptions  less  "procedural"  than  in  traditional 
programming  languages.  We  have  observed  earlier  that  conventions  for  operations  on  sets 
or  sequences  allow  one  to  eliminate  explicit  loops,  and  result  in  more  succinct  code.  DIAL 
encourages  the  use  of  "bulk"  operators  on  successive  temporary  sets  of  objects  by 
allowing  them  to  be  temporarily  augmented  with  additional  attributes.  DIAL  also  concetrates 
on  simple  control  constructs  which  are  heavily  used  in  business  applications  and  which  lead 
to  more  readable  programs. 

A  second  concern  of  some  CMLs  is  with  uniformity,  making  the  description  of  activities 
similar  to  those  of  data  objects.  In  TAXIS,  a  transaction  has  parameters,  local  variables, 
preconditions  and  statements.  By  adding  labels,  they  can  be  viewed  as  attributes,  so  that 
transactions  can  be  seen  as  entities  which  belong  to  classes.  This  allows  transactions  to  be 
organized  into  subclass  hierarchies,  and  hence  procedures  can  also  be  designed  by 
specialization  (see  Section  5,). 

HHHX'XXHHXHMXXXXMHMHXXMHXMXXXHXXXXXXX 

*  Support!  ng  persistent  events  * 

MXXXXHHXXXKXXXXXXKXXXXXXXXXXXXXXX'KWX' 

Transactions  are  usually  considered  to  be  atomic  activities  corresponding  to  "instantaneous" 
events.  However,  for  many  commercial  applications,  especially  in  the  field  of  office 
automation,  one  needs  to  specify  events  that  last  for  a  longer  period  of  time.  For  example, 
once  a  person  has  been  allowed  to  borrow  a  book  (e.g.,  he  doesn't  have  too  many  other 
books  out),  he  may  return  it  at  any  time,  or  renew  the  loan,  or  he  may  be  sent  an  overdue 
notice.  In  addition,  the  number  of  times  a  book  may  be  renewed  could  be  limited,  and 
library  privileges  may  be  revoked  for  those  borrowers  who  have  books  which  are  long 
overdue.  It  is  clearly  important  to  model  such  activities  and  their  semantics,  but  traditionally 
this  has  been  done  in  a  fragmented  manner,  if  at  all:  an  extra  counter  here,  some  code 
checking  things  there.  Concurrent  processes  capture  such  information  more  appropriately, 
but  standard  languages  (e.g.,  ADA)  lack  the  data  management  facilities  needed  to  maintain 
very  large  numbers  of  processes  (e.g.,  the  number  of  loans  of  a  library)  and  to  allow  users 
to  query  them  (e.g.,  how  many  loans  have  been  renewed  twice?). 

The  pioneering  work  of  Zisman  [Zisman  77]  introduced  the  notion  of  Augmented  Petri 
Nets  to  model  such  activities  in  the  context  of  office  systems.  These  are  essentially 
transition-networks  of  potential  activities,  where  transitions  usually  are  caused  by  the 
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occurrence  of  external  events,  the  passage  of  time  and/or  receipt  of  internal  messages. 
This  notion  was  incorporated  into  the  TAXIS  framework  through  scripts  [Barron  82], 
which  are  used  to  model  both  persistent  events  and  user  interaction  with  the  system.  The 
diagram  in  figure  6  captures  schematically  parts  of  the  script  for  lending  a  book. 

script  LEND  with 
attributes 

b:Books 
p:Borrowers 
dueDate:  Date 

I -  if  loan  not  permitted  — >  [stop] 

then  inform  user 

[start]  - —  identify  b.p  — ->  [1]  -  if  loan  permitted,  — >  [2] 

then  compute  dueDate; 

mark  book  on  loan; 


[2]  - — -  if  receive  book  back  — - >  [stop] 

then  mark  book  available 


if  receive  renewal  request  - >  [2] 

then  recompute  dueDate 

-  if  Stoday>dueDate  - >  [3]  —  if  7  days  later  - >  ... 

then  send  overdue  notice  then  send  another  notice 


Figure  4-2:  A  TAXIS  script  for  lending  a  book 


The  models  in  INCOD-DTE  [De  Antonellis  &  Zonta  81],  [Atzeni  et  al.  82]  and  the  Event 
Model  [McLeod  &  King  83],  [King  &  McLeod  83]  also  provide  state-transition  models  of 
event  sequencing. 

HXXXXHHMHMXMHHMXXKXXXMM-HHXMXHHHHXKH 

*  Communicating  with  the  user  * 

X-X  XXHXXXMXXXX'XXXHXXX  HXXHX  XXWXH  WMKMH 

Although  not  strictly  related  to  the  problem  of  modeling  activities,  this  is  an  appropriate 
place  to  mention  facilities  for  interaction  with  users.  Such  facilities  are  very  important 
since  most  information  systems  are  interactive,  and  much  code  is  devoted  to  this  task. 
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The  interaction  with  users  takes  place  in  scripts  in  TAXIS,  because  it  usually  involves 
delays.  The  above  example  script  hinted  at  the  way  in  which  the  actions  of  users  could 
affect  the  path  taken  by  a  script 

DIAL  uses  the  notion  of  forms  and  form-filling  to  develop  user  interfaces.  Jt  provides 
ways  of  describing  the  syntax  of  the  interaction  with  the  user  (e.g.  how  and  when  to 
prompt  for  an  answer)  as  well  as  specifying  what  edit-checks  to  perform  on  the  value  and 
what  actions  to  take,  depending  on  the  values  entered.  For  example,  a  form  could  be  used 
to  borrow  a  book  —  like  filling  out  a  slip  in  a  normal  library.  According  to  the  description 
in  Figure  7,  the  user  would  be  prompted,  among  others,  to  enter  the  call  number  of  the 
book  and  this  would  be  checked  to  be  a  string  and  have  additional  properties;  at  the  end, 
the  due  date  of  the  book  would  be  computed  and  displayed  by  the  form  on  the  30th  line 
of  the  screen. 

port_type  Loan _ slip 

displayed_attributes 

book  (0.0) 

readin,  type:  STRING 
prompt  "Call  number  of  book:" 

restriction:  /*  check  that  the  book  is  not  on  loan  already  */ 


due  (30,0) 
output,  type:  DATE 

/*  compute  the  due  date  to  be  displayed  */ 

Figure  4-3:  A  portion  of  a  DIAL  form  definition 

DIAL  also  aims  to  maintain  a  high  degree  of  uniformity,  and  so  views  forms  as  entities. 


5.  IS  design  methodologies  and  Conceptual  Models 

CMs  were  originally  intended  to  capture  the  semantics  of  an  enterprise  in  order  to  serve 
as  a  requirements  specification  for  later  stages  of  database  design  [McLeod  78],  [Lum  et 
al  78].  The  task  of  developing  large  and  detailed  conceptual  models  of  enterprises  which 
are  accurate,  consistent  and  complete,  has  run  into  the  same  complexity  problems  as  those 
encountered  in  developing  other  large  software.  The  solution  in  general  appears  to  be  to 
follow  some  consistent,  systematic  process  for  considering  the  myriads  of  details.  Recent 
research  [Smith  &  Smith  78],  [Borgida  et  al.  82],  [Atzeni  et  al.  82]  suggests  that  the 
constructs  offered  by  CMs  turn  out  to  provide  an  important  basis  for  methodologies  of 
modeling  enterprises. 

At  least  one  reason  for  this  is  that  the  special  semantic  relations  noted  in  Section  3  are 
intimately  associated  with  so-called  "abstraction  principles".  An  abstraction  principle  selects 
certain  details  of  a  description  as  being  important  or  relevant  at  this  moment;  the  rest  are 
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left  for  a  later  stage,  in  a  process  which  iteratively  refines  the  description.  Thus,  instance- 
of  corresponds  to  the  classification  abstraction,  whereby  common  characteristics  shared 
by  a  collection  of  entities  are  captured  in  the  description  of  one  entity,  the  class.  The 
aggregation  abstraction  allows  one  to  consider  an  entity  without  always  having  to  consider 
it  as  just  the  sum  of  its  parts,  and  is  supported  by  the  attribute-of  relations.  Finally, 
common  characteristics  of  several  class  are  factored  out  by  the  general ization  abstraction, 
which  of  course  corresponds  to  subtype  of.  The  significance  of  these  abstraction 
principles  for  database  design  was  first  demonstrated  by  the  Smiths  [Smith  &  Smith  77], 
and  has  since  been  extended  and  exploited  by  a  number  of  researchers. 

As  a  simple  example,  consider  how  TAXIS  uses  special  ization.  the  refinement  process 
corresponding  to  generalization,  to  build  a  portion  of  the  library  IS  discussed 
earlier  [Borgida  et  al.  82],  The  most  general  classes.  Books,  Borrowers,  Loans,  are 
described  first;  then  successively  smaller  subclasses  of  each  (e.g.. 

Short _ term _ loan _ books.  Reserve _ books.  Science _ books.  Computer _ books . 

Institutions,  Faculty,  Students,  Graduate _ students,  ...  )  are  described  by  providing  additional 

details  until  all  the  types  of  entities  which  are  expected  to  be  encountered  have  been 
described.  As  in  Smith  [Smith  &  Smith  77],  attributes  of  more  general  classes  are  not 
restated  for  subclasses,  only  the  differences  and  new  attributes.  The  next  phase  is  to 
describe  the  most  general  version  of  the  TakeOut  transaction  —  the  rules  applicable  to 
anyone  borrowing  any  book.  (For  a  schematic  example,  see  figure  8). 

transaction  TakeOut(b:Books,  p:Borrowers)  with 
prerequisites 

available:  {is  the  book  not  taken  out  already?} 
notTooManyOut  {has  the  borrower  more  than  5  books  out7} 
actions 

makeLoan:  {create  a  Loan} 
bookUnavailable:  {mark  the  book  "out"} 
count  {update  statistics  of  interest  to  library} 

end; 


Figure  5-1:  The  most  general  version  of  the  TakeOut  procedure 


The  hierarchy  of  classes  for  the  transaction  parameters  is  then  used  to  organize  the 
description  of  the  transaction  into  more  and  more  specialized  versions.  So,  one  can  first 
describe  borrowing  a  short-term  loan  book,  as  in  figure  9.  Successively,  one  then 

describes  TakeOut(p:lnstitutions),  TakeOut(p;Faculty),  ....  TakeOut(b:Computer _ books.  Students), 

etc.  —  i.e.,  those  situations  where  things  are  done  differently  then  in  the  more  general 
case.  The  resulting  transaction  could  have  been  written  as  one  large,  monolithic  procedure 
with  many  conditional  statements  for  special  cases.  The  specialization  process  however 
decomposed  and  organized  the  design  process.  Furthermore,  if  during  maintenance  some 
general  rule  changes,  inheritance  automatically  applies  it  to  all  subcases,  thereby  ensuring 
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transaction  TakeOut(b:Short _ term _ loan _ books,  p:BorrowerS)  with 

prerequisites 

onlyPeople:  {only  persons  can  borrow  short-term  books} 
onlyOne:  {a  person  can  have  at  most  one  short-term  book  out} 
actions 

resetRenewals:  {short-term  books  cannot  be  renewed} 
computeDueDate:  {short-term  books  circulate  for  3  days  only} 

end; 


Figure  5-2:  Borrowing  short-term  books:  a  more  specialized  TakeOut 

consistency.  In  TAXIS,  exceptions,  exception  handlers  and  scripts  are  also  specified  by 
descending  the  IS-A  hierarchy  of  classes. 

In  a  similar  vein,  in  Brodie  [Brodie  81]  the  aggregate  decomposition  of  objects  is  used 
to  systematically  consider  the  decomposition  of  the  actions  which  affect  the  object  For 
example,  when  a  hotel  reservation  has  as  parts  reservation #,  hotel,  room,  person,  arrival 
date,  departure  date,  then  insert- hotel -reservation  creates  a  reservation#,  finds  the  hotel, 
finds  and  updates  the  room,  inserts  the  person,  requests  and  inserts  the  arrival  and 
departure  dates. 

Adopting  a  more  general  view,  McLeod  and  King  [King  &  McLeod  83]  suggest  a  high 
level  description  of  the  enterprise  which  emphasizes  the  data  and  control  flow,  and  from 
which  one  can  derive  the  data  classes  which  will  be  needed  in  the  IS.  Such  a  high-level 
description  can  be  properly  said  to  be  a  requirements  specification  for  an  IS,  rather  than  a 
program  for  it  Conceptual  modeling  at  the  requirements  specification  level  has  been 
advocated  for  general  software  systems  [Bubenko  80],  [Wilson  79],  and  researchers 
associated  with  the  CMLs  described  in  this  paper  have  developed  requirements 
languages  [Greenspan  82],  [Kunin  82]  which  are  well-suited  for  being  implemented  using 
the  corresponding  CML. 


6.  Computer  aids  associated  with  CMLs 

Given  the  claim  that  CMLs  can  be  used  to  facilitate  the  development  of  IS,  it  is  natural 
that  researchers  have  attempted  to  develop  automatic  or  semi-automatic  tools  to  assist  the 
IS  designers.  In  general,  these  provide  language  support,  development  aids  and 
implementation  aids. 

The  ideal  computer  tool  supporting  the  use  of  a  CML  is  a  compiler  to  translate  a 
conceptual  description  directly  into  some  computer  representation  which  provides  for  data 
manipulation  efficiency  at  a  level  comparable  to  those  in  traditional  application  systems. 
Prototypes  of  such  compilers  are  being  developed  for  ADAPLEX  [Chan  82],  TAXIS  [Nixon 
83]  and  GALILEO,  —  the  ADAPLEX  storage  management  system  being  the  most 
sophisticated  —  but  none  are  yet  commercially  available.  Among  the  open  problems  is  the 
global  optimization  of  the  more  complex  consistency  checks  inherent  in  CMLs,  such  as 
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referential  integrity  and  constraints  on  the  ranges  of  attributes.  It  is  interesting  to  note  that 
each  of  these  groups  is  following  a  different  approach:  ADAPLEX  deals  with  permament 
storage  and  uses  the  ADA  compiler  to  implement  the  rest  of  the  program;  TAXIS  is 
translated  into  a  traditional  programming  language  and  DBMS  (PASCAL/R,  currently);  GALILEO 
will  have  a  full  compiler,  obtained  by  modifying  the  compiler  of  the  ML  language,  which 
motivated  the  desgin  of  GALILEO. 

CMLs  can  however  be  used  for  the  important  tasks  of  prototyping  and  testing 
Information  Systems,  by  having  an  interpreter  into  some  interactive  language  such  as  LISP. 
Such  an  interpreter  can  verify  the  syntactic  correctness  of  the  specification,  and  can  allow 
it  to  be  exercised  using  small  amounts  of  data  The  interpreter  can  be  augmented  to 
provide  a  more  complete  development  environment  by  removing  burdensome  clerical  tasks 
from  the  designer,  by  performing  consistency  checks  on  the  evolving  design  and  by  guiding 
the  designer  through  some  of  the  stages  of  refinement  For  example,  the  TAXIS  full¬ 
screen  editor  [O'Brien  82]  expands  the  abbreviation  introduced  by  inheritance  and  checks 
that  specialization  is  proceeding  properly;  the  DIALOGO  system  [Albano  &  Orsini  83] 
allows  the  description  to  be  queried  and  modified  on-line.  Similar  interactive  development 
aids  are  available  for  the  Event  Model  [King  &  McLeod  83],  and  for  an  augmented  Entity- 
Relationship  model  [Atzeni  et  al.  82] 


7.  Conclusions 

Let  us  summarize  first  the  advantages  of  conceptual  models  from  the  users’  point  of 
view.  We  have  claimed  that  an  IS  is  useful  to  the  extent  that  1)  the  information  it  stores  is 
accurate  and  complete,  and  2)  the  information  is  accessible  to  the  user. 

A  CML  helps  to  preserve  correctness  by  capturing  more  accurately  the  semantics  of  the 
world  This  is  done  either  through  the  constraints  which  are  built  into  in  the  language 
primitives  (eg.,  attribute  values,  integrity  of  reference),  or  by  allowing  the  desginer  to  state 
semantic  constraints  not  normally  available  in  DBMS. 

CMLs  make  it  easier  to  access  the  information  by 

•  modeling  directly  the  entities  and  activities  in  the  world; 

•  using  the  notion  of  attribute  —  computed  or  stored  —  to  access  directly 
from  one  object  all  relevant  related  objects;  this  is  in  contrast  with  having  to 
navigate  storage  access-paths  or  computed  joins; 

•  supporting  the  concise  manipulation  of  groups  of  entities  through  set-oriented 
operators,  multi-valued  attributes  and  appropriate  conventions; 

•  providing  uniform  access  to  the  generic  as  well  as  the  specific  information 
usually  found  in  the  database. 

From  the  point  of  view  of  IS  developers,  CMLs  advocate  an  approach  which  emphasizes 
the  modelling  of  the  real-world  enterprise,  as  opposed  to  consideration  of  implementation 
issues.  This  approach  is  supported  by  providing  a  vocabulary  specifically  tailored  for  this 
purpose  Furthermore,  the  designer  is  aided  in  the  task  of  gathering,  structuring  and 
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maintaining  the  IS  description  by  the  abstraction  hierarchies  supported  by  CMLs,  as  well  as 
the  associated  automated  tools. 

The  literature  on  Semantic  Data  Models  for  Information  Systems  is  extensive,  and  we 
have  not  refered  to  the  work  of  research  groups  led  by  Bubenko,  Langefors*  Roussopulos, 
Sowa,  Su,  Wiederhold,  Yao,  among  many  others.  The  interested  reader  may  locate  some 
of  this  work  by  following  pointers  in  the  proceedings  and  collections  listed  in  the 

bibliography  esp  [Lum  et  al  78],  [Brodie  &  Zilles  81],  [Chen  80]. 
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I.  Further  details  of  four  CMLs,  including  an  example. 


8.  An  example 

The  following  toy  problem  will  be  used  to  present  in  a  more  concrete  manner  the 
features  of  the  4  programming  languages  surveyed  in  this  paper.  The  problem  concerns  a 
portion  of  an  IS  for  supporting  the  activities  of  a  library.  We  have  selected  a  few  specific 
facts  to  be  represented  in  order  to  highlight  the  capabilities  of  the  languages  under 

consideration;  for  this  reason,  the  example  will  be  very  incomplete  and  will  ignore  a 
number  of  important  real-world  distinctions,  such  as  that  between  a  book  concept  (what  it 
is  about,  who  wrote  it,  when  was  it  written,  etc.)  and  particular  physical  copies  of  that 
book. 

For  our  purposes,  the  library  has  a  set  of  books  (one  copy  of  each  book),  each 
identified  by  a  unique  "call  number".  Each  book  has  a  title,  zero  or  more  authors,  as  well  as 
a  publisher.  The  books  are  grouped  according  to  subject  matter  according  to  a  scheme 
specific  to  this  library,  and  Computer  Books  form  one  such  subclass;  these  books  have  an 
ACM  Computing  Reviews  descriptor  and  are  published  only  by  Science  Publishers. 

Book  authors  have  first  and  last  names;  the  librarians  are  expected  to  use  their 
knowledge  to  distinguish  the  authors  of  books  at  the  time  the  book  descriptions  are 

entered  into  the  IS,  but  the  basis  of  these  distinctions  (e.g.,  biographical  information  in  the 
book)  should  not  be  incorporated  in  the  IS. 

Books  may  be  loaned  to  borrowers,  which  may  be  institutions  such  as  other  libraries,  or 
persons,  such  as  students  or  faculty  members.  Lending  books  to  institutions  requires 
additional  processing,  which  will  remain  unspecified  here. 

A  book  can  normally  be  borrowed  for  2  weeks,  and  two  1-week  extensions  are 
allowed,  if  the  extension  is  performed  before  the  due  date.  Some  books  may  be  placed 
on  restricted  loan;  these  can  only  be  taken  out  for  3  days  and  cannot  be  renewed.  In 

general,  any  person  may  have  on  loan  at  most  5  books  at  any  time,  and  at  most  one 

short-term  loan  book;  this  requirement  may  be  waived  at  the  discretion  of  the  librarian. 


8.1.  The  example  in  ADAPLEX 

The  ADAPLEX  language  provides  a  marriage  of  the  general-purpose  language  ADA  to  the 
notion  of  classes  and  sets  of  objects,  which  persist  even  after  the  end  of  the  program. 
Attributes  are  viewed  as  functions  from  objects  to  other  objects  and  are  also  persistent 
The  language  provides  facilities  for  set  manipulation  and  stating  quantified  formulas  over 
sets  and  classes,  and  incorporates  the  kinds  of  notational  abbreviations  described  in  Section 
3.  Adaplex  also  provides  the  capability  to  state  integrity  constraints,  such  as  the  use  of 
null  values,  membership  in  multiple  classes,  and  key  attributes;  arbitrary  constraints  can  be 
expressed  in  a  version  of  First  Order  logic,  although  such  integrity  checking  is  currently 
not  yet  implemented. 
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type  authors  is 
entity 

first_name:  STR I NG  (1  . . 20)  ; 
last_name:  STR  I  NG  (1 . . 40)  ; 
end  entity; 


type  publishers  is  ... 

subtype  sc i ence_pub 1 i shers  is  publishers  ... 


type  borrowers  is  ... 
subtype  institutions  is  borrowers  ... 
subtype  persons  is  borrowers  ... 
subtype  faculty  is  persons  ... 


type  books  is 
entity 

call#:  INTEGER; 
title:  STRING  (1  .  .120)  ; 
publisher:  publishers; 
written_by:  set  of  authors; 
end  entity; 

unique  call#  within  books; 


subtype  computer_books  is  books 
entity 

CRsubject_codes :  set  of  STR  I  NG  (1 . .  1*0)  ; 
end  entity; 

restr  i  cted_pub  1  i  shers :  integrity  for  every  y  in  computer_books: 

pub  i  i  sher' (y)  is  in  sc i ence_pub1 i shers; 


subtype  short_term_loan_books  is  books 
entity 

expi ry_of_restr i ct ion:  dates; 
requester:  faculty; 
end  entity; 


subtype  regul ar_loan_books  is  books 
entity 
end  entity; 

overlap  computer_books  with  short_term_loan_books,  regular_loan_books; 


subtype  books_on_l oan  is  books 
entity 

1oaned_to:  borrowers; 
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due_date:  dates; 

renewa 1 s_l eft:  INTEGER  range  0..2  :=0; 

end  entity; 


subtype  books_ava i 1 ab 1 e  is  books7 
entity; 
end  entity; 


The  following  must  be  declared  in  a  separate  Ada  package: 


function  overdue_books  ()  return  boolean  is 
begin 

return({b  in  books  where  before  (due_date  (b)  ,$today)  })  ; 
end  overdue  books; 


function  books_bor rowed  (prborrowers)  return  set  of  books  is 
begin 

return({b  in  books_on_l oan  where  1oaned_to  (b)  =p) )  ; 
end  books  borrowed; 


procedure  TakeOut  (b:  in  books_ava  i  1  abl  e;  p:  in  borrowers)  is 
due_on:  date; 
begin 
atomic 

if  (count  (books_bor rowed  (p) )  >=  5)  then  raise  loan_Jimit;  end  if; 
if  (b  is  in  short_term_loan_books) 

then  if  (p  is  not  in  persons)  then  raise  failure;  end  if; 
elsif  {y  in  books_borrowed  (p)  where 

y  is  in  shor t_term_l oan_books}  is  not  empty 
then  raise  loan_limit;  end  if;  end  if; 

if  (b  is  in  short_term_l oan_books) 
then  due_on  :=  add  (3, Stoday) ; 
else  due_on  :=  add  (1*4,  $  today)  ;  end  if; 
move  b  from  book s_avai  1  able  into  books_on_l oan  (loaned_to  =>  p, 

due_date  =>  due_on) ; 

if  (b  is  not  in  short_term_l oan_books) 

then  renewal  s_l  eft  (b)  :=  2;  end  if; 
end  atomic; 
end  TakeOut; 


In  ADAPLEX.  every  object  must  belong  to  some  "terminal  class"  —  i.e.,  one  which  has  no  subclasses  of 
its  own  For  this  reason,  it  is  usually  proper  to  define  "complement  classes"  when  defining  subclasses. 
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8.2.  The  example  in  GALILEO. 

GALILEO  also  extends  an  existing  programming  language  to  allow  for  database 
maintenance  and  manipulation;  in  this  case  the  language  is  ML.  which  is  a  strongly  typed, 
functional  language  —  one  in  which  functions  are  first  class  objects.  One  of  the 
fundamental  concepts  of  ML  is  that  of  "environment":  basically  a  set  of  bindings-  of  values 
to  identifiers;  environments  can  then  be  added  or  subtracted  from  each  other,  and  new 
bindings  can  be  added  or  dropped  from  an  environment  GALILEO  makes  environments 
themselves  first  class  citizens,  which  can  hence  be  bound  to  identifiers  themselves,  and 
then  uses  this  powerful  concept  to  define  classes,  databases,  views  and  modules.  GALILEO 
also  introduces  the  notion  of  type  hierarchy,  which  can  be  used  to  define  hierarchies  of 
classes  as  in  other  semantic  models.  GALILEO  achieves  the  most  consistent  integration  of 
general  programming  facilities  with  those  needed  for  IS  applications,  and  is  likely  the  most 
advanced  of  the  CMLs. 


use  Library  := 

rec  authors  class 
author  <-> 

(first-name:  string  and 
last-name:  string) 

and  books  class 
book  <-> 

(ca 11#:  string  and 
publisher:  Publisher  and 
title:  string  and 
authors:  seq  Author) 
key  (call#) 

and  computer-books  subset  of  books  class 
computer-book  <-> 

(is  book  ext 

publisher:  Sc i encePub 1 i sher  and 
CRsubj ect-code:  seq  string) 

and  short-term- loan-books  subset  of  books  class 
short-term-loan-book  <-> 

(is  book  and 

expi ry-of-restr i ct i on:  date  and 
requester:  faculty) 

and  books-on- 1  oan  partition  of  books  with  books-avai  lable  class 
book-on-loan  <-> 

(is  book  and 

loaned-to:  borrower  and 
due-date:  var  date  and 

renewa  1  s- 1  ef t:  default  var  0:  var  (0  or  1  or  2) 
and  overdue-books  := 

derived  books-on- 1  oan  with  before  (due-date,  Stoday) 
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and  borrowers  class 
borrower  <-> 

(name:  string  and 
address:  string  and 

books-borrowed  :=  derived  (all  Books-on-loan  with  loaned-to  =  this)  ) 

and  persons  class  ... 
and  institutions  class  ... 


TakeOut (b : book ,  p:borrower)  := 

(if  (count  (books-borrowed  of  p)  >=  5)  then  failwith  loan-limit; 
if  b  alsoin  books-on-loan  then  fail; 
if  b  alsoin  short- term- 1  oan-books 

then  if  not  (p  alsoin  Persons)  then  fail 

else  if  (some  x  in  books-borrowed  of  p 

with  x  alsoin  short-term- loan-books) 
then  failwith  loan- limit; 

use  due-on  := 

if  (b  is  in  short-term-loan-books)  then  add  (3,$today) 
else  add  (14,  $today) 

in 


inBooks-on-loan  (b,b  and  loaned-to  :=  p  and  due-date  :=  var  due-on); 

if  not  (b  alsoin  short-term- loan-books)  then  at  renewals- left  of  b  ;=  2; 

) 
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8.3.  A  Taxis  description 

The  TAXIS  language  also  considers  data  design  to  be  of  central  importance,  and  thus 
adopts  the  fundamental  view  shared  by  all  CMs:  objects  inter-related  through  properties, 
grouped  into  classes,  which  are  arranged  in  an  IS- A  hierarchy.  TAXIS  however  introduces 
meta-classes  as  a  mechanism  to  describe  uniformly  object  and  meta-information,  and  then 
extends  these  ideas  uniformly  to  the  description  of  procedures  (transactions),  exceptions 
and  exception  handlers.  Thus,  for  example,  procedures  have  as  properties  their  parameters 
and  statements,  and  can  be  specialized  to  yield  IS-A  hierarchies  of  procedures. 

In  Taxis,  one  could  set  up  a  meta-class  MATERIAL  TYPE,  which  would  to  record 
information  about  the  different  kinds  of  books  that  is  uniformly  applicable  or  summary  in 
nature;  for  example,  LoanDuration  or  the  number  of  books  in  the  class.  The  class  BOOKS, 

and  any  desired  subclasses  such  as  SHORT _ TERM _ LOAN _ BOOKS,  can  have  different 

values  for  these  attributes. 

Since  TAXIS  permits  only  'downward"  movement  of  objects  in  the  IS-A  hierarchy  of 
classes,  one  cannot  elegantly  represent  the  lending  and  return  of  books  through  the 
BOOKS-ON-LOAN  subclass,  and  we  must  resort  to  creating  the  class  of  LOANS;  this  does 
have  the  advantage  that  one  can  conveniently  keep  information  about  past  loans,  even  when 
the  books  are  returned. 

Also,  since  TAXIS  does  not  allow  set-valued  attributes,  the  relationship  between  authors 
and  books  must  also  be  represented  using  a  separate  class. 

dataclass  BORROWERS  with 
attribute 

name:  STRINGS 
address:  ADDRESSES 
#0utsdand i ngLoans :  0..5 
end  BORROWERS; 

dataclass  INSTITUTIONS  isa  BORROWERS  with  ... 

dataclass  PERSONS  isa  BORROWERS  with 
attributes 

hasShor tLoan:  {true,  false} 
end  PERSONS; 

dataclass  FACULTY  isa  PERSONS  with  ... 


dataclass  AUTHORS  with 

characteristics 

firstName:  STRING; 
lastName:  STRING; 
end  AUTHORS; 
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metaclass  MATER  I  AL_TYPE  with 
attributes 

1 oanDurat i on:  INTEGERS 
numberOf I  terns :  INTEGERS 
end  MATER  I AL_TYPE ; 

dataclass  BOOKS  in  MATER  I  AL_TYPE  with 
characteristics 

call#:  INTEGERS 
publisher:  PUBLISHERS 
title:  STRING 

keys 

bookKey:  (CALL#) 
end  BOOKS; 


dataclass  WRITTEN_BY  with 
attributes 

work:  BOOKS 
writer:  AUTHORS 
end  WRITTEN  BY 


dataclass  COMPUTER_BOOKS  isa  BOOKS  with 
characteristics 

publisher:  SC  I ENCE_PUBL I SHERS 
CRSubjectCodes:  STRING 
end  COMPUTER  BOOKS; 


dataclass  SH0RT_TERM_L0AN_B00KS  in  MATER  I  AL_TYPES  isa  BOOKS  with 
characteristics 

expi ryOf Restr i ct ion:  DATES 
requester:  FACULTY 
end  SH0RT_TERM_L0AN__B00KS ; 

SHORT_TERM_LOAN_BOOKS . 1 oanDurat i on  <-  3; 


dataclass  REGULAR_LOAN_BOOKS  in  MATER  I AL_TYPES  isa  BOOKS; 
REGULAR_LOAN_BOOKS. loanDuration  <-  1 4; 

dataclass  LOANS  with 

characteristics 
item:  BOOK 
loanedTo:  BORROWER 
attributes 

dueDate:  DATE 
renewal sLeft:  0. .2 

keys 

loanKey:  (item,  loanedTo) 
end  LOANS; 
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TestOefinedClass  0VERDUE_L0ANS  isa  LOANS8 

attribute  test  on  LOANS,  ANY_CLASS  is  CHECK_DUEDATE ; 

transaction  CHECK_DUEDATE  (y:  LOANS ,  C:ANY_CLASS)  isa  BOOLEAN_FUNCT  I  ON 
actions 

if  C=0VERDUE_L0ANS  a  Before  (y.dueDate,$Today) 
then  return  (true)  ; 
else  return  (false)  ; 

end; 


transaction  TakeOut  (b:  BOOKS ,  p:  BORROWERS)  with 
prerequisites 

ava i  1  ab  1  e:  begin; 

get  x  from  loans  with  i  tem=b; 
if  x  =  nil  then  true  else  false; 
end;  exc  FAILURE 

notTooManyOut:  (p .#0utsdand i ngLoans  <  5)  exc  LOANL  IMIT  (who:p) 
actions 

makeLoan:  Insert  x  in  LOANS  with  item=b,  loanedTo=p,  renewal  sLeft  =  2 

setDueDate:  x.dueDate  <-  Add($Today,  minclass  (b)  .  1  oanDurat i on)  ; 
count:  p.#Outstand i ngLoans  <-  p.#Outstandi ngLoans  +  1; 

end; 


transaction  TakeOut  (b : SH0RT_TERM_L0AN_B00KS ,  p:  BORROWERS)  with 
prerequisites 

onlyPeople:  (p  instance-of  PERSONS)  exc  FAILURE 
onlyOne:  (p.hasShortLoan  =  false)  exc  LOANL  I  MIT  (who:  p) 

actions 

resetRenewa 1 s :  p . renewa 1 sLef t  <-  0; 
setShor tLoan:  p.hasShortLoan  <-  true; 

end; 

Other  specializations  of  TakeOut,  such  as  one  for  institutional  borrowers  would  follow. 

If  storage  space  is  at  a  premium,  the  designer  may  choose  to  compute  certain  attributes, 
such  as  #OutstandingLoans  or  hasShortLoan,  rather  than  store  them.  This  can  be 
accomplished  by  defining,  for  example,  the  attribute 

#0utstand i ngLoans:  C0UNT_L0ANS 

and  the  transaction  COUNT _ LOANS,  which  counts  all  instances  of  LOANS  with  this  object 

as  borrower. 


A  Test-Defined  class  has  associated  a  boolean  function  which  filters  out  instances  of  the  superclass  which 
meet  certain  computationally  specified  criteria 
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transaction  C0UNT_L0ANS (b: BORROWERS)  is  FUNCTION  with 
locals 

ct: INTEGER; 
actions 

ct  <-  0; 

for  each  instance  x  of  LOANS  do 

if  b=x.loanedTo  then  ct  <-  ct+1 ; 
return  (ct)  ; 

end; 

Finally,  it  is  most  likely  that  the  lending  of  books  would  be  modeled  by  scripts  in  TAXIS, 
rather  than  transactions  and  the  class  LOANS,  since  it  involves  long-term  events,  dynamic 
constraints  and  user  communication.  Such  a  script  model  was  sketched  in  Section  4. 
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8.4.  The  example  in  DIAL. 

The  fundamental  philosophy  of  DIAL  is  that  a  programming  language  for  IS  design  should 
concentrate  on  features  which  allow  for  the  succinct  expression  of  the  most  frequently 
recurring  patterns  in  database  application  programs,  at  the  possible  cost  of  generality,  and 
that  data  description  ought  to  be  the  central  aspect  of  IS  development  DIAL-  has  at  its 
center  the  conceptual  modeling  features  of  the  SDM  language  for  modeling  data;  the 
procedural  aspects  of  an  IS  can  be  expressed  through  specialized  and  restrictive  control 
structures,  not  the  traditional  IF-THEN-ELSE  conditional  and  WHILE-loops. 

DIAL  also  incorporates  facilities  for  developing  user  interfaces  through  the  notion  forms 
and  form-filling.  This  is  an  important  concept  since  a  great  deal  of  business  application 
programs  is  concerned  with  input  and  output  of  information.  Forms  are  also  viewed  as 
objects,  and  the  user  is  expected  to  provide  values  for  their  attrbiutes;  DIAL  then  provides 
special  declarative  facilities  to  specify  the  syntax  of  the  user  interaction  for  filling  each 
attribute  (e.g.,  how  and  when  to  prompt  for  an  answer,  what  value  is  expected). 

class  AUTHORS 
attributes 

f i rst-name 

declaration :  STR I  NG 
1 ast-name 

declaration :  STR  I  NG 
mandatory 

class  BOOKS 
attributes 

call# 

declaration:  ISBN_NUMBERS 
mandatory 
publ i sher 

value-class:  PUBLISHERS 

title 

declaration:  STRING 
wr i tten-by 

value-class:  AUTHORS,  multi-valued 

identifiers: 

call# 


class  COMPUTE R_B00KS 

derivation:  subset  of  BOOKS 
attributes 

CRsubj  ect-code 

declaration:  SUBJECT-CODES,  multi-valued 

class  SHORT-TERM-LOAN-BOOKS 
derivation:  subset  of  BOOKS 


class  LOANS 
attributes 
i  tem 
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value-class: 
1 oaned- to 

value-class: 

due-date 

value-class: 
renewal s-lef t 

value-class: 


BOOKS 

BORROWERS 

DATE 

0-to-2 


class  BORROWERS 
attributes 

books-bor rowed 

derivation:  Item  of  match  to  LOANS  on  )oaned-to,  multi-valued 


class  PERSONS  derivation:  subset  of  BORROWERS 
class  INSTITUTION  derivation:  subset  of  BORROWERS 


create_proc  norm. LOAN  (b: BOOKS ,  p:BORROWERS,  durat i on: I NTEGER) 9 
[Ini  tattr  (i  tem,b) 

I ni tattr  ( loaned- to , p) 

I  n i  tattr  (due-date,  add  (durat  i on, Today  () )  ) 

I n i tattr  (renewa 1 s- 1 ef t, 2)  ] 

update_proc  NoRenewa  1  s  ( 1  oan :  LOANS) 

[updateattr ( 1 oan , renewa 1 s- 1 ef t , 0)  ] 

procedure  TakeOut  (b :  BOOKS ,  p:B0RR0WER)  {transaction} 

{signals  too-many,  failure} 

[signal  too-many  if  count  (p.  books-bor  rowed)  >5 
type  <-  "short"  if  (b  is  in  SH0RT_TERM_L0AN_B00KS) 
type  <-  "normal"  if  not  (b  is  in  SH0RT_TERM_L0AN_B00KS) 
signals  failure  if  type="short"  and  not  (p  is  in  PERSONS) 
signals  too-many  if  type="short"  and 

0<count  (p. books-bor rowed  and  SH0RT_TERM_L0AN_B00KS) 
time  <-  if  type  =  "normal" 
time  <-  3  >f  type  =  "short" 

2  <-  create_L0AN  (b:b,  p:p,  durat i on : t ime) 

NoRenewa  1  s  (1  oan: 2)  if  type  =  "short" 

] 


This  procedure  encapsulates  the  creation  of  LOAN  instances. 
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9.  A  short  note  on  approaches  to  implementing  CMLs 

Finally,  it  may  be  instructive  to  consider  the  different  approaches  to  implementing  CMLs 
which  are  currently  being  considered 

The  ADAPLEX  compiler  is  basically  a  pre-processor  which  separates  out  the  portions  of 
the  program  dealing  with  persistent  data  management,  and  passes  the  remainder  of  the 
code  to  a  regular  ADA  compiler,  which  would  access  data  through  function  calls.  The 
implementors  of  ADAPLEX  have  chosen  to  provide  special  sophisticated  data  structuring 
facilities  for  the  database,  thus  taking  full  advantage  of  the  object-oriented  nature  of  the 
CML  and  resulting  data  base,  but  had  to  avoid  touching  the  ADA  compiler,  which  is  not  yet 
available. 

In  contrast,  the  implementors  of  GALILEO  are  modifying  an  actual  compiler  for  the 
language  ML,  on  which  GALILEO  is  based.  As  a  result,  GALILEO  can  provide  more 
uniformity  in  the  persistence  of  the  various  datatypes,  since  their  run-time  implementation 
is  known. 

In  order  to  decrease  the  total  effort  required  in  writing  the  compiler,  the  implementors 
of  TAXIS  chose  to  translate  TAXIS  programs  entirely  into  a  language  which  provides 
traditional  data  management  facilities,  namely  PASCAL/R.  Similar  translation  could  have  been 
provided  into  PL/I  plus  calls  to  some  other  relational  DBMS.  Such  an  approach  allows 
TAXIS  to  take  advantage  of  the  considerable  work  on  optimization  in  DBMS,  but  at  the 
cost  of  not  being  able  to  fine-tune  this  for  an  object-oriented  database.  A  similar 
philosophy  was  adopted  by  Kulkarni  [Kulkarni  83],  who  implemented  an  extension  of  the 
DAPLEX  language  in  PS-Algol  ([Atkinson  et  al  82],  [Atkinson  et  al  83]),  a  language  which 
allows  for  the  storage  of  persistent  data 

In  conclusion,  none  of  the  languages  considered  above  appears  to  be  clearly  superior  to 
the  others,  in  fact  there  is  room  for  a  language,  which  would  capture  the  best  of  the  ideas 
present  in  each  language,  such  as  the  functional  treatment  of  attributes,  especially  sets,  in 
ADAPLEX,  metaclasses,  procedural  specialization  and  scripts  in  TAXIS,  derived  classes  and 
attributes,  as  well  as  forms  for  information  input,  in  DIAL,  and  the  modularization  facilities 
and  rich  but  uniform  type  structure  of  GALILEO. 
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Abstract 

The  basic  abstraction  mechanisms  of  Semantic  Data 
Models  -  aggregation,  classification  and  generali¬ 
zation  -  are  considered  the  essential  features  to 
overcome  the  limitations  of  traditional  data 
models  in  terms  of  semantic  expressiveness.  An 
important  issue  in  database  programming  language 
design  is  which  features  should  a  programming 
language  have  to  support  the  abstraction  mecha¬ 
nisms  of  Semantic  Data  Models.  This  paper  shows 
that  when  using  a  strongly  typed  programming 
language,  that  language  should  support  the  notion 
of  type  hierarchies  to  achieve  a  full  integration 
of  Semantic  Data  Models  abstraction  mechanisms 
within  the  language's  type  system.  The  solution  is 
presented  using  the  language  Galileo,  a  strongly 
typed,  interactive  programming  language  specifi¬ 
cally  designed  for  database  applications. 


I.  INTRODUCTION 

In  the  past  the  fields  of  programming  languages 
and  database  languages  have  developed  separately 
because  each  focused  on  different  classes  of 
problems.  Research  in  programming  language  design 
has  concentrated  on  creation  of  features  to 
support  the  implementation  of  complex  algorithms 
using  temporary  data  (Shaw  80).  Research  in 
database  language  design,  instead,  has  been  mainly 
concerned  with  features  to  model  persistent, 
interrelated  data  which  must  be  accessed  by 
programs  or  interactive  query  languages.  An 
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important  exception  to  these  divergent  trends  has 
been  the  common  attempt  by  both  programming  and 
database  researcher  workers  to  design  a  basic  set 
of  abstraction  mechanisms  for  data  modeling.  Still, 
the  solutions  provided  have  been  quite  distinct 
(Biller  78,  Brodie  80,81;  Schmidt  78;  Weber  78). 

Recently,  this  situation  has  been  changing,  largely 
because  database  people  are  paying  more  attention 
to  the  design  of  languages  that  besides  types, 
abstract  types  and  modularization,  include  abstrac¬ 
tion  mechanisms  to  support  database  models.  For 
instance,  proposals  and  implementations  have  been 
given  to  integrate  a  relational  data  model  into  a 
general-purpose,  Pascal-like  programming  language 
(ASTRAL  (Amble  79),  PASCAL-R  (Schmidt  80),  PLAIN 
(Wasserman  79),  RIGEL  (Rowe  79),  THESEUS  (Shopiro 
79)). 

Another  database  goal,  which  will  have  far  reaching 
impact  on  programming  languages,  is  the  design  of  a 
language  for  database  applications  which  supports 
the  basic  features  of  Semantic  Data  Models.  A 
Semantic  Data  Model  is  a  set  of  data  abstraction 
mechanisms  to  describe  the  structure  of  databases: 
the  structures,  and  the  associated  operations,  are 
explicitly  intended  to  represent  certain  types  of 
real-world  information.  A  survey  and  an  analysis  of 
the  motivations  for  this  new  generation  of  data 
models  is  reported  in  (McLeod  82).  It  is  sufficient 
here  to  remember  that  the  basic  abstraction  mecha¬ 
nisms  are  classification,  aggregation  and  genera¬ 
lization.  For  the  purposes  of  this  paper,  we  here 
interested  in  considering  the  third  mechanism, 
named  also  IS-A  hierarchy,  and  originally  proposed 
in  the  context  of  Semantic  Networks. 
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The  IS-A  hierarchy  is  used  in  Semantic  Data  Models 
as  a  definitional  mechanism  involving  two  different 
notions  (Wong  77).  First,  supposing  Students  IS-A 
Persons,  it  establishes  an  existence  constraint 
among  the  elemens  of  Students  and  Persons  present 
in  the  database:  The  elements  of  Students  are  in 
every  state  a  subset  of  the  elements  of  Persons 
(Extensional  IS-A  Constraints).  Secondly,  it  is  a 
compatibility  rule  between  the  elements  of  Students 
and  Persons,  in  that  every  element  of  Students 
inherits  all  the  properties  of  Persons  elements 
(Structural  IS-A  Constraint).  The  inheritance  rule, 
with  this  interpretation  of  the  IS-A  mechanism,  is 
therefore  strict  rather  than  default  (e.g. 
(Carbonell  81)).  Consequently,  elements  of  Students 
can  be  used  in  any  context  were  an  element  of 
Persons  is  expected,  by  not  vice  versa  (the 
compatibility  rule  is  a  partial  order). 

The  question  is  which  features  should  a  programming 
language  have  to  support  this  abstraction  me¬ 
chanism.  If  we  think,  for  simplicity,  of  Students 
and  Persons  as  identifiers  bound  to  collections  of 
values  of  type  Student  and  Person,  the  first  notion 
behind. the  IS-A  hierarchies  is  a  constraint  on  the 
values  of  the  identifiers,  while  the  second  is  a 
compatibility  rule  between  the  types  of  the 
elements. 

Three  of  the  database  programming  languages  that 
have  been  proposed  to  address  this  problem  are 
TAXIS,  ADAPLEX  and  DIAL.  TAXIS,  which  has  the  merit 
of  being  the  first  proposed,  uses  an  approach  not 
based  on  a  typed  programming  languages  (Mylopoulos 
80).  More  closely  related  to  the  author's  work  is 
ADAPLEX,  since  the  solution  proposed  is  given 
within  the  framework  of  a  strongly  typed  program¬ 
ming  language,  in  this  case  ADA  (Smith  81,  Wegner 
80).  We  consider  this  approach  more  interesting 
because  we  believe  that  the  well  known  benefits  of 
static  typechecking  are  notable  for  database 
applications:  The  task  of  modeling  becomes  easier 
and  more  productive  (Brodie  80,  Biller  78).  The 
solution  adopted  by  ADAPLEX,  however,  is  ad  hoc  for 
modeling  databases,  and  it  is  not  an  independent 
feature  of  the  language  that  can  be  used  also  for 
modeling  temporary  data.  Similar  considerations 
apply  to  DIAL  (Hammer  80),  which  has  evolved  from 
SDM  (Hammer  81):  It  is  a  programming  language  with 
data  types,  but  the  features  for  database  medeling 
are  not  integrated  with  the  data  type  system.  We 
claim  that  a  better  solution  could  be  achieved  if 
the  type  system  of  the  language  would  support  the 
notion  of  type  hierarchies. 

We  will  discuss  the  solution  adopted  in  Galileo,  a 
strongly  typed,  interactive  programming  language, 
which  integrates  Semantic  Data  Model  abstraction 
mechanisms  into  the  framework  of  the  language 
Edinburgh-ML  (Gordon  79b,  Albano  82).  In  particu¬ 
lar,  Galileo  provides  two  independent  features:  1) 
a  type  system  with  type  hierarchies,  and  2)  the 


Class  mechanism  to  deal  with  databases.  When  these 
features  are  combined  in  defining  derived  classes, 
an  IS-A  hierarchy  is  modeled. 

A  complete  description  of  Galileo  is  outside  the 
scope  of  this  paper;  it  has  been  given  in  (Albano 
82)  and,  together  with  the  denotational  semantics, 
in  (Capaccioli  83).  In  the  next  section  we  give  an 
overview  of  the  language.  Section  3  describes  the 
notion  of  type  hierarchies,  and  in  Section  4  we 
present  the  class  mechanism  to  deal  with  databases 
and  to  model  the  IS-A  hierarchies. 

2.  OVERVIEW  OF  Galileo 

Galileo  is  not  a  Semantic  Data  Model,  but  it  is  a 
strongly  typed  programming  language  which  supports 
the  following  abstraction  mechanisms  of  Semantic 
Data  Models  to  design  a  database  application: 

Classification:  Entities  of  the  world  being 

modeled  that  share  common  characteristics  are 
described  by  the  type  of  the  elements  of  a  class. 
’  The  name  of  the  class  denotes  the  elements 
currently  present  in  the  database.  The  elements  of 
a  class  are  represented  uniquely;  no  copies  of 
them  are  allowed. 

Aggregation :  Elements  of  classes  are  aggregates, 
i.e.  they  are  abstractions  of  heterogeneous 
components  and  may  have  elements  of  other  classes 
as  components.  Associations  among  entities  are 
represented  by  aggregations  in  a  Galileo  database. 
Components  of  elements  of  classes  can  be  collec¬ 
tions  of  homogeneous  values  to  represent,  for 
example,  multivalued  associations  among  entities. 
Moreover,  because  of  the  unique  representation  of 
elements  of  classes,  any  modification  of  an 
element  is  reflected  anywhere  that  element  appears 
as  component. 

Generalization:  Elements  of  a  class  can  be 

described  in  different  ways  by  means  of  derived 
classes.  Elements  of  a  derived  class  also  belong 
to  the  parent  class  from  which  the  class  is 
derived  using  a  predefined  set  of  operators.  The 
derived  classes  mechanism  includes  the  IS-A 
hierarchy  of  Semantic  Networks  and  Semantic  Data 
Models . 

Modularization:  Data  and  operations  can  be  parti¬ 
tioned  into  interrelated  modules.  Therefore,  a 
complex  schema  can  be  structured  into  smaller, 
meaningful  and  manageable  units.  For  instance,  a 
unit  may  model  a  user  view  or  a  description  of  the 
schema  produced  by  a  stepwise  refinement  methodo¬ 
logy  by  specialization. 

Other  features  of  Galileo  are: 

1.  It  is  an  expression  oriented  language,  in  that 
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each  construct  is  applied  to  values  to  return  a 
value.  This  feature  is  interesting  because  it 
allows  the  interactive  use  of  Galileo  without 
resorting  to  a  new,  stand-alone  query  language. 

2.  It  is  higher  order,  in  that  functions  are 
denotable  values  of  the  language.  Therefore,  a 
function  can  be  a  component  of  an  aggregate 
which  represents  an  entity,  e.g.  an  age  may  be 
described  as  a  function  of  the  birthdate. 

3.  Every  denotable  value  of  the  language  possesses 
a  type: 

a.  A  type  is  a  set  of  values  sharing  common 
characteristics,  together  with  the  primitive 
operators  which  can  be  applied  to  these 
values. 

b.  The  predefined  types  of  the  language  are 
bool,  nua,  int,  string,  equipped  with  the 
usual  operators,  and  the  type  null,  which  is 
a  singleton  set  with  the  element  nil, 
equipped  with  the  equality  operator. 

c.  The  type  constructors  available  to  define 

new  type  names,  from  predefined  or  previou¬ 
sly  defined  types,  are:  Tuple  (record), 
sequence,  discriminated  union  (variant), 
function,  modifiable  value  (reference),  and 
abstract  types.  There  are  two  constructors 
for  abstract  types:  <->  and  <-►  .  The  former 

is  similar  to  CLU  clusters  (Liskov  77), 
ALPHARD  forms  (Shaw  77,  81)  or  Euclid 

modules  (Lampson  77).  It  is  used  to  define  a 
new  type  together  with  the  operations 
available.  The  latter  is  similar  to  the  type 
constructor  of  Ada:  it  defines  a  new  type 
which  inherits  the  primitive  operations  of 
the  representation  type. 

d.  The  type  system  supports  the  notion  of  type 
hierarchy,  in  that  if  a  type  t  is  a  subtype 
of  a  type  t',  then  a  value  of  t  can  be  used 
as  argument  of  any  operation  defined  for 
values  of  t',  but  not  vice  versa  because  the 
subtype  relation  is  a  partial  order.  The 
type  hierarchy  is  a  directed  acyclic  graph 
instead  of  a  simple  tree. 

4.  Every  Galileo  expression  has  a  type.  The 
meaning  of  "an  expression  e  having  type  t"  is 
that  the  value  of  e  possesses  the  type  t.  In 
general,  any  expression  has  a  type  that  can  be 
statically  determined,  so  that  every  type 
violation  can  be  detected  by  textual  inspection 
(static  type  checking).  However,  if  the  type 
checker  is  not  able  to  ascribe  a  type  to  an 
expression,  the  user  must  specify  the  type  with 
the  notation  "Expression:  Type".  The  language 
has  been  designed  to  be  statically  type 
checkable  for  two  reasons:  First,  for  the 
considerable  benefits  in  testing  and  debugg¬ 


ing;  secondly,  because  programs  are  safely 
executed  disregarding  any  information  about 
types  at  run  time.  Execution  time  testing  will 
be  required  for  constraints  only.  Finally, 
static  type  checking  allows  the  typechecker  to 
give  the  correct  meaning  to  overloaded  opera¬ 
tors,  i.e;  operators  which  can *43 e  used  with 
operands  of  different  types. 

5.  Class  elements  possess  an  abstract  type  and  are 
the  only  values  which  can  be  destroyed. 
Predefined  assertions  on  classes  are  provided 
and,  if  not  otherwise  specified,  the  operators 
for  including  or  eliminating  elements  of  a 
class  are  automatically  defined. 

6.  A  structured  control  structure  is  provided  for 
failures  and  their  handling. 

The  following  definition  of  a  simple  schema 
illustrates  Galileo.  The  example  concerns  depart¬ 
ments  and  employees  in  a  firm.  The  definitions  are 
collected  in  the  Organization  schema. 

Organizations 

(rec  Departments  class 
Department  <-* 

(Name:  string 

and  Budget:  var  nura 

and  Address:  Address 

and  Manager:  var  Employee 

and  Employees:  var  seq  Employee) 

key  (Name) 

and  Employees  class 
Employee  «-* 

(Name:  string 
and  Salary:  var  nun 
and  Dept:  =  Department 
key  (Name) 

and  NewEmployee  (Name:  string. 

Salary:  nun, 

NameOfDept:  string 
)  :  Employees 

use  ADepts 

get  Departments 

with  Name=NameofDept 
if-fails  failwith  "unknown  dept." 
ext  AnEmployees 

mkEmployee  (Names  Name 

and  Salary s  var  Salary 
and  Depts  ADept) 
in 

(Employees  of  ADept  ♦ 

Employees  of  ADept  append[  AnEmployee] 
AnEmployee) 
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and  VipEmployee  subset  of  Employees  class 
VipEmployee  «-» 

(is  Employee 

and  VipProperty:  string) 

and  type  Address :=  (Street:  string 

and  Zip:  6tring 
and  City:  string) 

drop  mkEmployee 

) 

The  rec  is  used  for  recursive  functions  or  for 
mutually  dependent  types,  such  as  Department  an 
Employee. 

Departments  an  Employees  are  examples  of  base 
classes,  while  key  in  an  example  of  predefined 
constraint  to  assert  that  the  elements  of  the 
classes  must  differ  in  the  value  of  the  Name 
attribute. 

An  attribute  can  be  modified  if  and  only  if  it  is 
defined  of  type  var,  otherwise  it  is  constant  and 
any  attempt  to  update  the  value  is  detected 
statically. 

The  function  NewEmployee  is  an  example  of  a 
defined  operation  included  in  the  schema.  It  is 
the  only  operation  which  can  be  used  to  create  new 
elements  of  the  class  Employee  since  the  drop 
operator  prevents  the  predefined  mkEmployee  opera¬ 
tion  from  being  exported  outside  the  schema 
definition.  For  Departments  and  VipEmployees  the 
functions  mkDepartment  and  mkVipEmployee  are 
available. 

VipEmployees  is  an  example  of  a  derived  class.  It 
contains  all  those  employees  who  are  believed  to 
be  very  important.  The  elements  of  a  derived  class 
must  have  a  type  which  is  a  subtype  of  the 
elements  of  the  parent  class.  For  instance,  the 
type  of  the  elements  of  VipEmployees  is  that  of 
Employee  with  the  additional  attribute  VipProper¬ 
ty. 

This  example  shows  how  classes  are  used  to  deal 
with  sets  of  interrelated  objects.  The  approach 
has  some  similarity  to  that  adopted  for  relational 
databases:  In  both  cases  the  associations  among 
data  are  described  by  means  of  the  value  of  an 
attribute.  However,  in  relational  databases  data 
are  tuples  of  simple  values,  collected  in  rela¬ 
tions,  and  the  associations  among  them  are 
represented  by  assigning  as  value  to  an  attribute 
the  key  value  of  another  tuple.  In  Galileo, 
instead,  the  mechanism  of  "data  sharing"  is  used 
to  represent  associations,  so  that  an  element  of  a 
class  can  be  shared  as  component  by  many  others. 

3.  TYPE  HIERARCHIES 

An  Important  property  of  Galileo  is  the  notion  of 
subtype:  if  a  type  u  is  a  subtype  of  a  type  v  (u 


is  v),  then  a  value  of  the  type  u  can  be  used  in 
any  context  where  a  value  of  the  type  v  is 
expected,  but  not  vice  versa,  i.e.  the  subtype 
relation  is  a  partial  order.  For  instance,  if  a 
function  f  has  a  formal  parameter  of  type  v,  then 
an  application  of  f  to  a  value  of  type  u  is 
correctly  typechecked  because  no  run  time  errors 
can  occur.  It  is  important  to  stress  the  point 
that,  since  Galileo  has  a  secure  type  system,  the 
notion  of  type  hierarchies  is  related  to  that  of 
well  typed  expression  (Gordon  79a):  Expressions 
which  are  syntactically  well-typed  are  always 
semantically  well-typed,  i.e.  the  expressions  do 
not  cause  run-time  type  errors  and  give  a  value  of 
the  correct  type,  if  they  terminate.  In  Milner's 
words  "well-typed  expressions  do  not  go  wrong" 
with  hierarchies  among  types  (Milner  78). 

This  notion  of  type  hierarchies  is  different  from 
the  subtype  concept  of  ADA,  which  is  essentially  a 
mechanism  to  give  another  name  for  a  type  whose 
set  of  values  has  been  constrained,  but  is  similar 
to  the  subclass  machanism  of  Simula  67  (Birtwistle 
73)  and  Smalltalk  (Ingalls  78).  The  interesting 
aspects  of  the  way  it  is  used  in  Galileo  is  that 
this  notion  is  extended  to  all  the  types,  in  the 
sense  explained  in  the  sequel,  while  preserving 
the  important  property  that  the  language  is  still 
strongly  typed. 

With  this  mechanism  Galileo  supports  the  notion  of 
programming  by  data  specialization  originally 
introduced  by  Simula  67  and  generalized  in  TAXIS 
to  all  the  constituents  of  a  database  application: 
Data,  transactions,  assertions  and  scripts 
(Borgida  82).  Complex  software  applications, 
especially  those  employing  databases,  can  be 
designed  and  implemented  incrementally:  Once  a  set 
of  functions  has  been  designed  and  tested  for  the 
most  general  data,  they  can  still  be  used  with 
data  of  any  subtype  introduced  later  on  in  the 
software  development  process.  Moreover,  new  func¬ 
tions  on  the  subtypes  can  be  defined  by  the 
composition  of  the  old  functions  with  specific 
expressions . 

The  type  system  of  Galileo  includes  primitive 
types  and  constructors  to  introduce  user  defined 
types,  both  concrete  and  abstract.  For  concrete 
types  the  type  equivalence  rule  is  the  so-called 
structural  equivalance:  User-defined  types  names 
are  just  used  as  an  abbreviation  for  the  structure 
they  represent.  For  abstract  types  the  type 
equivalence  rule  is  the  so-called  name  equivalence 
rule:  Two  user-defined  types  are  always  different, 
and  are  different  from  the  representation  type. 

User  defined  concrete  types  are  tuples  (record), 
sequences,  discriminated  unions  (variants),  modi¬ 
fiable  values  (references)  and  functions.  For 
these  types  the  subtype  relation  is  automatically 
inferred  by  the  typechecker  according  to  the 
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following  rules  (Albano  82): 
1.  For  any  type  t,  (t  is  t). 


types  and  will  be  presented  by  an  exemple: 
type  Time  (Hours:  int  and  Minutes:  int) 


2.  If  r  and  s  are  tuple  types,  of  the  form  "(ll:tl 

and . and  ln:tn)",  then  (r  is  s)  iff: 

a.  The  set  of  labels  of  r  contains  the  set  of 
labels  of  s,  and 

b.  if  r'  and  s'  are  the  types  of  a  common 
label,  then  (r1  is  s'). 

3.  If  r  and  s  are  variant  types,  of  the  form  "<  1^ 
:t^  or  ....or  ln:tn>  ",  then  (r  is  s)  iff: 

a.  The  set  of  labels  of  r  is  contained  in  the 
set  of  labels  of  s,  and 

b.  if  r1  and  s'  .  are  the  types  of  a  common 
label,  then  (r'  is  s'). 

4.  If  r  and  s  are  sequence  types,  of  the  form  "seq 
t",  with  elements  of  types  r*  and  s'  then  (r  is 
s  )iff  (r1  is  s' ) . 

5.  If  r  and  s  are  modifiable  types,  of  the  form 
"var  t",  then  (r  is  s)  iff  the  associated 
types  are  the  same. 

6.  If  (r  *  s)  and  (r'  ♦  s')  are  function  types, 
then  (r  +  s)  is  (r'  *  s')  iff  (r'  is  r),  and  (s 
is  s'). 


For  instance,  if 

type  ( Address :=  (Street:  string 
and  Zip:  string 

and  VipAddress :=  (Street:  string 
and  Zip:  string 
and  Country :  string) 
ext  Person :=  (Name:  string 

and  Address:  var  Address) 
and  Student :=  (Name:  string 

and  Address:  var  Address 
and  School :  string 
and  VipPerson:=  (Name:  string 

and  Address: var  VipAddress) ) 


This  declaration  introduces: 

1.  The  new  type  Time  with  a  domain  isomorphic  to 
tuples. 

2.  the  identifiers  mkTime  and  repTime  bound  to  two 
primitive  functions,  automatically  declared,  to 
map  values  of  the  representation  type  into  the 
new  one,  and  vice  versa. 

3.  The  selectors  "Hours  of"  and  "Minutes  of",  wich 
are  primitive  operators  on  the  representation 
type.  That  is  to  say,  primitive  operators  are 
inherited  by  the  new  type,  with  their  names, 
but  this  overloading  does  not  introduce  ambi¬ 
guities  because  the  typechecker  can  infer  the 
meaning  of  an  operator  from  the  type  of  the 
operands.  A  feature  is  also  provided  to 
restrict  the  set  of  operators  to  be  inherited 
and  to  include  assertions  to  be  tested  at  run 
time  (Albano  82). 

For  abstract  types  the  subtype  relation  must  be 
explicitly  declared  to  the  typechecker  as  follows: 

Id  is  Id'  «-♦  t,  where  Id'  ♦-*  t'  and  (t  is  t'  ) 

For  instance: 

type  ( Person  (Name:  string 

and  BirthDate:  string 
and  Address:  string) 

ext  Student  is  Person  *-► 

(Name:  string 
and  BirthDate:  string 
and  School :  string 
and  Address:  string)) 

The  following  abbreviation  emphasizes  the  fact 
that  the  subtype  Student  inherits  the  attributes 
of  the  supertype  Person: 


then 

Student  is  Person,  and 
VipAddress  is  Address 

while  it  is  false  that 

Person  is  VipPerson 
Person  is  Student, 
Student  is  VipPerson, 
VipPerson  is  Person,  and 
VipPerson  is  Student 


To  define  abstract  types,  Galileo  provides  two 
constructors.  One,  which  will  not  be  discussed 
here,  is  similar  to  CLU  clusters,  Alphard  forms 
and  Euclid  modules.  The  other  is  similar  to  ADA 


type  Student-*-*  (is  Person 

and  School:  string) 

Finally,  multiple  hierarchies  are  declared  as  Id 
is  Id',  Id"*-*  t,  where  (t  is  t ' )  and  (t  i3  t")  or 
in  the  abbreviated  form  "Id*-*  is  Id',  Id",  ...". 


4.  CLASSES 


Classes  are  the  mechanism  to  represent  a  data  base 
by  means  of  sets  of  modifiable  interrelated 
objects.  An  element  of  a  class  is  an  object  which 
is  the  computer  representation  of  certain  facts 
about  an  entity  of  the  world  that  is  being 
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modeled.  An  object-oriented  view  of  a  database  is 
characterized  by  the  following  (Borgida  82,  Kent 
79,  McLeod  82): 

1.  There  is  a  one-to-one  correspondence  between 
objects  in  the  database  and  entities  of  the 
world  which  are  being  modeled. 

2.  The  objects  of  the  database  are  all  distinct 
and  they  might  not  have  an  external  reference, 
such  as  a  key,  that  stands  for  them. 

3.  Associations  among  entities  are  modeled  by 
relating  the  corresponding  objects  and  not 
external  references.  Moreover,  only  objects 
that  exist  in  the  database  can  be  used  to  model 
associations. 

A  class  is  characterized  by  a  name  and  the  type  of 
its  elements.  The  name  of  a  class  denotes  the 
elements  of  the  class  currently  present  in  the 
data  base,  while  the  type  gives  the  structure  of 
the  elements.  The  type  of  the  class  elements  must 
be  an  abstract  type;  therefore  two  elements  of 
different  classes  are  always  of  different  type, 
although  they  may  be  defined  to  have  the  same 
representation . 

Elements  of  classes  are  the  only  values  in  Galileo 
which  can  be  destroyed.  Moreover,  they  are 
uniquely  represented  and  when  updated,  their 
modification  is  reflected  in  all  other  objects  in 
which  they  appear  as  components. 

Each  class  can  be  either  a  base  class  or  a  derived 
class.  A  base  class  is  defined  independently  of 
other  classes,  while  a  derived  class  is  defined  in 
terms  of  other  classes.  As  in  SDM  (Hammer  81),  a 
base  class  is  used  to  model  a  primitive  collection 
of  entities,  while  a  derived  class  is  used  to 
model  alternative  ways  of  looking  at  the  same 
entities. 

Base  Classes 

A  base  class  is  defined  by  the  environment 
operator  class,  as  shown  in  the  following  example 
with  two  mutually  defined  classes. 

rec  Departments  class 
Department 

(Name:  string 
and  Budget:  var  num 
and  Address:  string 
and  Manager:  var  Employee 
and  Employees: var  seqEmployee) 
key  (Name) 
and  Employees  class 
Employee  ** 

—  (Name:  string 

and  Salary:  var  num 
and  Dept:  Department) 
key  (Name) 


The  class  operator  introduces  the  following 
bindings : 

1.  The  names  Department  and  Employee  bound  to  new 
types  isomorphic  to  tuples. 

2.  The  classes  identifiers  Departments  and  Emplo¬ 
yees  bound  to  modifiable  sequences  of  values  of 
types  Department  and  Employee. 

3.  The  names  mkDepartment  and  mkEmployee  bound  to 
two  primitive  functions,  automatically  decla¬ 
red,  which  differ  from  the  similar  functions  on 
abstract  types  in  that  every  time  they  are 
applied,  new  elements  are  created  and  are  also 
automatically  inserted  into  the  associated 
classes,  if  the  specified  constraints  are  not 
violated.  The  constructed  elements  are  also  the 
values  returned  by  the  functions. 

4.  The  functions  repDepartment  and  repEmployee  to 
map  elements  of  the  classes  into  the  represen¬ 
tation  type. 

The  above  declaration  defines  the  structure  of  the 
objects  together  with  a  few  constraints,  some  of 
which  are  predefined  constraints  to  be  tested  when 
a  class  is  modified: 

a.  The  key  constraints  asserts  that  elements  of  a 
class  must  differ  in  the  value  of  certain 
attributes.  Note  that  if  the  key  constraints  is 
not  specified,  the  insertion  will  be  made  even 
though  the  value  of  the  attributes  are  equal  to 
those  of  another  object  already  present  in  the 
class.  That  is,  elements  of  classes  are  always 
distinct  objects,  but  the  construction  of  an 
element  will  fail  when  the  constraints  are 
violated. 

Other  constraints  are  specified  directly  in  the 
definition  of  element  types: 

b.  Only  attributes  with  a  var  type  can  be 
modified . 

c.  The  attributes  Employees  and  Manager  in  Depart¬ 
ments  are  used  to  model  the  part-of  relation¬ 
ship  of  Semantic  Networks,  which  imply  the 
followings  dependency  constraints:  an  employee 
cannot  be  eliminated  from  the  database  as  long 
as  he  is  a  component  of  a  department. 

Derived  Classes 

In  Galileo  the  two  notions  behind  the  IS-A 
hierarchy  are  expressed  with  two  distinct  mecha¬ 
nisms:  The  type  hierarchy,  to  deal  with  the 
intensional  aspect,  and  the  derived  class  to  deal 
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with  the  extensional  aspect.  A  derived  class 
implies  an  existency  constraints  among  its  ele¬ 
ments  and  those  of  the  parent  class,  i.e.  the 
elements  of  a  derived  class  are  also  elements  of 
its  parent  class.  The  type  of  the  elements  of  a 
derived  class  must  be  a  subtype  of  the  element 
type  of  the  parent  class.  As  a  consequence  of  the 
subtype  hierarchy,  the  elements  of  a  derived  class 
can  be  used  as  actual  parameter  for  any  operation 
defined  for  the  elements  of  its  parent  class. 

There  are  three  ways  of  defining  a  derived  class: 
by  subset,  partition  or  restriction  /Albano  82/. 
Let  us  consider  the  first  one  which  is  the 

mechanism  to  model  IS-A  hierarchies. 

A  subset  class  contains  a  subset  of  the  elements 
of  the  parent  class  which  have  been  included 

explicitly  with  the  proper  operator.  When  a  new 
element  is  added  to  a  subset  class,  then  it 

becomes  also  an  element  of  the  parent  class. 

Classes  can  also  be  derived  from  more  than  one 
parent  class,  with  the  restriction  that  the  type 
of  the  element  must  be  a  subtype  of  all  the 

element  type  of  parent  classes. 

For  example: 

Secretaries  subset  of  Employees  class 
Secretary 

(is  Employee  and  Position  :  string) 

FemaleEmployees  subset  of  Employees  class 
FemaleEmployee  ■*— 

(is  Employee  and  Maternities  :  var  num) 

FemaleSecretaries  subset  of 

Secretaries,  FemaleEmployees  class 
FemaleSecretary  ♦-*  (is  Secretary, 

FemaleEmployee ) 

The  Employees  are  specialized  in  two  overlapping 
subset  classes,  which  in  their  turn  are  parents  of 
another  derived  class. 

5.  CONCLUSIONS 

The  problem  of  integrating  Semantic  Data  Models 
features  in  a  strongly  typed  programming  language 
has  been  addressed.  A  solution  has  been  shown  in 
the  framework  of  the  language  Galileo,  designed 
specifically  with  the  above  goal  in  mind.  In 
particular,  the  generalization  abstraction  mecha¬ 
nism  has  been  examined  and  it  has  been  shown  that 
to  achieve  a  true  integration  of  this  feature  in  a 
strongly  typed  language,  the  type  system  should 
support  the  notion  of  type  hierarchies. 


The  presentation  has  been  informal,  but  this 
notion  derive  naturally  from  semantic  considera¬ 
tions.  The  approach  adopted  is  based  on  a  previous 
result  of  Cardelli  /82/:  He  has  proved,  in  the 
framework  of  Edimburgh-ML ,  a  semantic  soundness 
theorem  for  a  type  system  with  multiple  inheri¬ 
tance  of  types,  based  on  Milner's  theory  of 
polymorphism.  We  are  currently  working  on  the 
proof  of  the  theorem  for  the  Galileo  type  system. 

A  preliminary  implementation  of  a  Galileo  subset 
have  been  described  in  /Albano  83/.  Presently,  the 
final  definition  of  the  language  has  been  comple¬ 
ted  and  a  more  efficient  implementation  is  in 
progress.  This  is  being  done  by  extending  the  ML 
implementation  made  by  Cardelli  on  a  VAX  11/780 
running  the  UNIX(*)  operating  system. 

The  implementation  of  Galileo,  for  the  time  being, 
is  for  a  single  user  environment  and  it  does  not 
include  mechanisms  for  efficient  recovery  and 
concurrency  control.  In  fact,  the  intended  imple¬ 
mentation  is  not  to  release  a  DBMS  based  on  a 
Semantic  Data  Model,  although  ADAPLEX  has  shown 
that  the  time  is  mature  for  this  kind  of  DBMS’s 
too.  Our  main  concerns  are: 

a.  To  test  the  features  of  the  language  for 
conceptual  database  design; 

b.  To  study  the  architecture  of  a  Database 
Designer’s  Workbench,  the  basic  facilities,  and 
tools  to  support  the  database  design  process 
(Albano  83) . 
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Abstract 


There  is  an  intrinsic  simplicity  and  similar¬ 
ity  in  clinical  trials,  despite  their  apparent 
overwhelming  variety.  Their  basic  processes,  in 
investigating  the  cause  and  effect  relationship 
between  some  intervention  and  its  outcome,  are  the 
same.  They  also  share  common  problems.  Through 
the  enforcement  of  the  proper  procedures  we  can 
ensure  the  quality  of  a.  clinical  experiment.  The 
expense  and  commitment  of  time  and  resources  to  the 
carrying  out  of  clinical  trials  demands  attention 
to  these  matters  ...  if  for  no  other  reasons  than 
to  achieve  some  result  given  the  funds  invested  and 
to  ensure  the  safety  of  participating  patients. 

Research  at  the  University  of  Toronto  into  the 
problems  of  interactive  information  systems  design 
has  resulted  in  the  development  of  a  high-level 
programming  language  and  design  environment  called 
TAXIS.  It  is  our  contention  that  the  appl ication  of 
the  TAXIS  tools  and  methodology  to  the  clinical 
trial  management  domain  will  result  in  easier-to- 
write  systems  that  are  more  understandable,  modifi¬ 
able,  extensible,  reliable,  transportable  and,  in 
the  long  run,  less  costly. 

Motivation 

There  is  an  intrinsic  simplicity  and  similar¬ 
ity  in  clinical  trials,  despite  their  apparent 
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overwhelming  variety.  '  '  There  are  prevention  and 
intervention  trials;  small  trials  of  a  few  subjects, 
and  large  multi-centre  trials  involving  tens  of 
hundreds  of  institutions  and  thousands  of  patients. 
They  differ  in  such  details  as  inclusion  or  exclu¬ 
sion  criteria  for  the  acceptance  or  rejection  of 
patients  into  the  trial;  methods  for  allocating 
patients  to  experimental  and  control  groups;  the 
many  types  of  treatment,  investigation  and  data 
collection  steps  that  make  up  the  protocol;  and  so 
on.  But  their  basic  processes,  in  investigating 
the  cause  and  effect  relationship  between  some  in¬ 
tervention  and  its  outcome,  are  the  same.  They 
also  share  common  problems. 

In  the  final  analysis,  many  trials  are  found  to 
be  faulty.  Some  of  the  reasons  include,  of  course, 
the  formulation  of  inappropriate  or  untestable 
hypotheses,  and  the  failure  to  do  preliminary 
studies  on  the  adequacy  cf  the  available  experimen¬ 
tal  population.  These  types  of  problems  —  errors 
in  the  concept  of  the  trial  itself  —  must  be 
solved  by  the  designers  of  medical  experiments  and 
do  not  concern  us .  1 
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There  is  a  class  of  faults  in  the  performance 
of  clinical  trials,  however,  which  is  amenable  to 
solution  by  what  one  would  consider  to  be  algorith¬ 
mic  techniques.  These  are  faults  in  the  process, 
or  event  and  data  collection  sequence  of  the  trial. 
Through  the  enforcement  of  the  proper  procedures  we 
can  ensure  the  quality  of  the  experiment  itself. 

It  is  our  contention  that  the  expense  and  commitment 
of  time  and  resources  to  the  carrying  out  of  clini¬ 
cal  trials  demands  attention  to  these  matters  ...  if 
for  no  other  reasons  than  to  achieve  some  result 
given  the  funds  invested  and  to  ensure  the  safety 
of  participating  patients. 

Data  management  during  a  clinical  experiment 
involves  a  high-level  of  manpower -intensive  cleri¬ 
cal  activity.  Personnel  must  ensure  that  the  proto¬ 
col  is  followed;  that  the  events  of  the  trial  are 
documented;  that  reports  are  produced  and  circula¬ 
ted;  and  so  forth.  Recently,  there  has  been  a 
trend  for  some  of  these  activities  to  be  relegated 
to  one  or  other  type  of  computer-based  information 
system.  Systems  are  now  being  implemented  wherein 
clinical  researchers  input  data  directly  to  termin¬ 
als  and  are  able  to  review  the  current  data  of  a 
trial  by  retrieval  using  a  terminal.  Their  database 
transactions  are  usually  short  and  completely  pre¬ 
dictable,  and  demand  fast  response. 

Clinical  trial  management  systems  thus  pose 
problems  typical  of  Interactive  Information  Systems 
(IIS)  :  capturing  somehow  an  inherent 

and  conceptually-simple  procedural  structure  that  is 
obscured  by  a  morass  of  details  and  exceptions; 
organizing  and  ensuring  the  integrity  of  great 
masses  of  data;  and  providing  many  short  but  pre¬ 
determined  transactions  for  busy,  often  computer- 
naive  users.  Conventional  languages  and  classic 
programming  techniques  offer  little  help  to  the 
system  designer  facing  such  challenges. 

Research  at  the  University  of  Toronto  into  the 
problems  of  IIS  design  has  resulted  in  the  develop¬ 
ment  of  a  high-level  programming  language  and 
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design  environment  called  TAXIS.  It  is  our  con¬ 
tention  that  the  application  of  the  TAXIS  tools  and 
methodology  to  the  clinical  trial  management  domain 
will  result  in  a  powerful  support  system  that  is 
easier-to-write ,  more  understandable,  modifiable, 
extensible,  reliable,  transportable  and,  in  the 
long  run,  less  costly  than  a  system  developed  using 
conventional  programming  languages. 

TAXIS  Language  and  Design  Environment 

TAXIS  combines  concepts  from  three  areas  of 
computer  science  research:  a  semantic  network 
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framework  from  artificial  intelligence,  data 
abstraction  and  exception  handling  from  programming 
languages,  and  relational  database  facilities  and 
semantic  integrity  constraints  from  database  manage¬ 
ment.  Integrating  these  concepts  in  a  common  for¬ 
malism,  TAXIS  provides  mechanisms  not  found  in  con¬ 
ventional  languages  such  as  COBOL,  FORTRAN,  PL/1  or 
PASCAL,  for  the  organization  of  large  amounts  of 
conceptually  simple  descriptive  and  procedural 
details,  typical  of  its  IIS  applications. 

The  methodological  tools  and  the  IIS  problems 

5 

they  address  are  outlined  in  a  paper  by  Borgida. 

In  brief,  TAXIS  provides  a  means  of  arranging 
masses  of  information  in  a  natural  and  understand¬ 
able  way,  using  taxonomies  (hierarchies)  in  its 
data  model  and  its  procedures.  TAXIS  uses  a  seman¬ 
tic  data  model,  based  on  "objects"  representing 
real-life  entities  and  the  relationships  between 
objects.  Like  other  semantic  networks,  TAXIS 
relates  objects  not  just  to  each  other,  but  also  to 
the  procedures  which  operate  on  them.  TAXIS 
provides  a  unified  approach  to  data  and  procedures 
alike. 

Abstraction  Tools 

Describing  a  large  enterprise  is  difficult. 
Abstraction  tools  can  make-y-jj_gtask  more  manageable, 
while  at  the  same  time  promoting  the  design  of  well- 
formed  systems.  Three  such  tools  pervade  TAXIS: 
classification,  aggregation  and  generalization. 

1)  Classification 

Rather  than  considering  every  individual 
object  separately,  it  is  useful  to  group  similar 
objects  into  "classes"  ("TAXIS"  is  a  Greek  word 
meaning  "class")  such  as  "PERSONS,"  "INTEGERS"  and 
"COLORS."  Thus,  the  class  COLOR  might  be: 

(/  red,  orange,  ...  violet  /) 

The  relationship  "is"  (meaning  "is  an  instance  of") 
indicates  membership  in  a  class: 

"red  is  COLOR". 

Applying  classification  to  classes  themselves,  we 
get  "metaclasses,"  such  as  "PERSON-CLASS,"  whose 
instances  include  PATIENTS,  DOCTORS,  EMPLOYEES, 
ADULTS,  and  so  on.  Classification  allows  aprogram- 
mer  to  abstract  a  common  generic  description 
(aggregation) ,  instead  of  having  to  consider  all 
instances  separately. 

2)  Aggregation 

Objects  are  described  by  associating  with  a 
class,  properties  which  in  turn  induce  correspond¬ 
ing  properties  for  each  of  its  instances.  There 
are  "definitional  properties"  for  a  class,  such  as 
"name"  and  "age"  for  PERSON: 

PERSON 

name :  STRING 

age:  POSITIVE-INTEGER 

and  corresponding  "factual  properties"  then  for 
each  instance  of  the  class: 

person  1  .  name  =  "John  Smith" 
person  1  .  age  =  20 

The  definitional  property  value  enforces  a  range 
constraint  on  induced  factual  property  values,  as  in 
20  is  POSITIVE-INTEGER 
"John  Smith"  is  STRING 

The  purpose  of  aggregation  is  to  describe 
classes  by  associating  them,  through  their  property 
values,  with  other  classes.  But  detailed 


,  consideration  of  these  related  classes 

can  be  left  for  later. 

3)  General  lzatior, /Specialization 

Classification  and  aggregation  alone  seldom 
make  the  mass  of  detailed  information  in  an  IIS 
manageable.  Experience  has  shown  that  groups  of 
classes  are  often  similar  in  their  descriptions  and 
in  how  they  are  treated.  Further,  it  is  beneficial 
to  describe  their  common  features  in  a  general 

class  of  objects,  and  then  to  specialize.^  This  is 
most  easily  done  using  hierarchies  of  classes. 

For  example,  PERSON,  PATIENT,  EMPLOYEE  and 
CHILD  all  have  common  human  properties;  indeed, 
PERSON  may  be  thought  of  as  a  generalization  of  the 
others.  This  can  be  expressed  in  TAXIS  using  the 
"ISA"  relationship: 

PATIENT  isa  PERSON 
CHILD  isa  PERSON 

CHILD-PATIENT  isa  CHILD,  PATIENT 
The  corresponding  ISA-hierarchy  is  given  in  Figure  1 . 

PERSON 

name :  STRING 
age:  POS-INT 

\ 

CHILD  PATIENT 

age:  POS-INT-UNDER18  insurance#:  OHIP# 

CHILD-PATIENT 
guardian:  ADULT 

Figure  1 :  An  ISA  Hierarchy 

Here  we  can  see  illustrated  the  following  rules  for 
class  specification: 

a)  A  specialized  class  "inherits"  all  defini¬ 
tional  properties  of  its  parent  (more  generalized) 
class  (es) .  E.g.  CHILD  and  PATIENT  inherit  "name" 
and  "age"  from  PERSON. 

b)  The  classes  for  inherited  properties  may  be 
more  specialized  than  in  the  parent  class  (e.g. "age" 
in  CHILD)  or  may  remain  the  same  (e.g.  "name"). 

c)  A  specialized  class  may  have  additional 
properties  (e.g.  "insurance#"  for  PATIENT) . 

CHILD-PATIENT  illustrates  multiple  inheritance,  with 
inherited  properties  and  constraints  "name,"  "age" 
(<18),  and  "insurance#,"  plus  a  new  property 
"guardian . " 

Exception  Handling 

Dealing  with  exceptions  in  conventional  lan¬ 
guages  is  a  messy  business;  programs  end  up  with  a 
tangle  of  "if  . . .  then  . . .  else”  clauses  that  may 
rarely  be  used.  TAXIS  allows  exceptions  to  be  dealt 
with  in  a  very  flexible  way,  outside  the  main 
program.  During  the  first  few  passes,  a  system 
designer  need  only  specify  conditions  for  raising 
exceptions;  handlers  are  written  separately  later. 
This  mechanism  frees  programmers  from  initial 
consideration  of  details,  and  helps  simplify  and 
modularize  programs. 

Integrity  Constraints 

The  specification,  inheritance  and  specializa¬ 
tion  of  property  value  classes  provide  convenient  j 
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structural  mechanisms  for  expressing  data 
constraints.  The  database  operators  maintain  the 
integrity  of  the  database  values  within  this  frame¬ 
work.  Property  values  may  be  further  constrained 
by  assertions  associated  with  data  class  defini- 
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tions.  And  behavioral  constraints  are  implicit  in 
the  procedural  hierarchies  of  TAXIS;  the  effects  of 
a  particular  operation  on  a  class  must  be  preserved 
by  corresponding  operations  on  specializations  of 
the  class. 


transitions 

find -nadir -form 

from  nadir-counts  tci  nadir-counts 
conditions 

not-found:  nadir-form  -  nothing 
entered;  giving  (data-col 1 ection-scr ipt , 
f:  NADIR-FORM) 

actions 

get-form:  take  (data-collection-script ,  f) 
store-form:  nadir-form (-  f 
end  find-nadir-form; 


Procedures 

TAXIS  offers  two  kinds  of  procedures:  trans¬ 
actions,  which  are  atomic  manipulations  of  the 
database  or  local  variables;  and  scripts.  Scripts 
are  concurrent  processes  of  significant  duration 
(even  months) ;  they  can  communicate  with  users  and 
each  other -The  igV6Htsec3uences  of  an  IIS  are 
modelled  in  the  scripts  of  its  TAXIS  program. 

Figure  2,  for  example,  shows  a  pictorial  rep¬ 
resentation  of  part  of  a  MONITOR  script  which 
follows  each  patient  through  a  treatment  cycle  in 
our  cancer-trial  system.  The  format  is  basically 
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what  is  called  a  Petri  net,  adapted  from  Zisman, 
with  states  (ellipses)  for  each  trial  event,  and 
transitions  (horizontal  bars)  between  states 
describing  system  activities. 


find- infusion- form 
look  for  form  with 
infusion  details 


-day!4 


--  s 

if  day 
L _  . 

nadir-counts^ 

£day28 
if  day 

(£o 


follow 


-up-one') 
- ^ 


find-nadir-form 
look  for  form  with 
toxicity  measures 


f ind-response-form 
look  for  form  on 
response  to  infusion 


Figure  2:  Part  of  a  Script  (informal) 

The  first  event  is  a  drug  infusion.  In  the 
infusion  state,  the  script  expects  to  be  given  a 
data  form  about  the  infusion;  failure  of  the  form 
to  arrive  raises  an  exception  whose  handler  is 
specified  separately.  By  day  14,  the  infusion  is 
over  and  its  toxic  effect  has  peaked,  so  the  script 
moves  on  to  the  nadir  state,  in  which  it  awaits  a 
form  with  toxicity  measures.  All  trial  events  are 
dealt  with  in  a  similar  way. 

One  can  see  a  close  correlation  between  the 
English  description  above  and  the  Petri  net  diagram. 
The  translation  from  Petri  nets  tofontVcbl.TAXIS 
code  is  also  quite  straightforward.  For  example, 
TAXIS  code  for  the  transitions  find -nadir -form  and 
day 28  are  given  in  Figure  3. 


day  28 

from  nadir-counts  t£  follow-up-one 
conditions 
cl :  day  =  28 
postconditions 

found?:  not  nadir-form  =  nothing 
exception  missing-form 

end  day28 ; 


Figure  3:  Part  of  a  Script  (formal  TAXIS) 


A  recent  trend  in  semantic  networks  has  been 
toward  the  extension  of  taxonomies  to  embrace  pro¬ 
cedural  classes.  In  much  the  same  way  as  data 
classes  are  specialized,  so  too  are  TAXIS  trans¬ 
actions  and  scripts.  One  may  think,  for  example, 
of  the  activities  during  different  types  of  patient 
appointments  as  being  specializations  of  (i.e. 
additions  to)  some  general  core  visit  process.  The 
MONITOR  script  too  has  specializations,  based  on 
response  to  treatment. 

TAXIS  thus  offers  the  following  advantages 
over  more  traditional  languages: 

1)  Initial  system  design  at  a  very  high  level, 
stripped  of  complicating  details  and  exceptions, 
and  not  far  removed  from  a  verbal  description  of 
the  application  processes  (as  shown  in  the  MONITOR 
fragment) . 

2)  Provision  for  the  systematic  reintroduction 
of  detail  during  design,  through  specialization, 
exception  handling  and  the  conventional  calling 
hierarchy . 

3)  Convenient  and  powerful  built-in  integrity 
constraints . 

4)  Reduction  of  repetitive  coding  (andpossibJe 
inconsistencies)  with  ISA  inheritance. 

5)  Resulting  well-structured  systems  that  are 
relatively  easy  to  understand,  extend  and  modify. 

6)  Separation  of  general  concepts  from 
specific  details  in  the  ISA  hierarchies,  leading  to 
the  ultimate  generalization  —  a  modelling  frame¬ 
work  that  can  be  applied  to  any  related  domain. 

More  on  TAXIS  can  be  found  in 


references. 


4,5,8,9,10,11,25 


TAXIS  Project 

Various  parts  of  the  TAXIS  project  are  pro¬ 
ceeding  in  parallel  with  our  clinical  trial  system 
design:  a  translator  into  PASCAL/R  (B.  Nixon, M. Sc. 

Thesis) ,  an  interactive  programming  environment 
including  editor  (P.  O'Brien,  M.Sc.  Thesis),  a 
methodology  for  the  design  of  user  interfaces 
(M.  Pilote,  Ph.D.  Thesis),  a  related  requirements 
specification  language  (S.  Greenspan,  Ph.D.  Thesis), 
a  pacemaker  clinic  system  (C.  DiMarco,  M.Sc.  Thesis), 
a  translator  for  scripts  (L.  Chung,  Ph.D.  Thesis), 
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and  a  credit  card  verification  system  (S.  Park, 
M.Sc.  Thesis) . 

Current  Status  of  Clinical  Trial 
Management  Systems 

Most  of  the  work  done  in  the  area  of  computer 
support  for  clinical  trials  has  been  in  the  devel¬ 
opment  of  file  management  and  database  management 
systems  to  collect,  store,  retrieve  and  report  on 
data  and  to  prpvide  tools  for  analyzing  and  pre¬ 
senting  results.  We  call  such  systems  "research 
data  management  systems."  Some  of  these  informa¬ 
tion  systems  provide  strengths  in  key  areas  such  as 
more  advanced  quality  control  of  data;  better, 
easier-to-use  query  languages;  and  other  such 
capabilities . 

At  the  next  level  are  systems  which  provide 
more  operational  support  for  clinical  research.  We 
wouJd  term  such  systems  "patient  follow-up  support 
systems.'1  In  this  case,  additional  tools  are 
provided  which  allow  the  clinical  researchers  to 
schedule  patients  for  visits,  send  out  notices  for 
upcoming  visits,  produce  reports  before  and  after 
visits,  detect  failure-to-shows  and  produce 
"delinquency"  notices,  and  so  forth.  A  variety  of 
such  systems  have  been  developed,  including  one  at 
the  Division  of  Cardiology  at  Toronto  General 
12 

Hospital . 

At  the  highest  level,  systems  have  been  devel¬ 
oped  which  provide  tools  for  the  actual  sequencing 
of  the  documentation,  investigation  and  treatment 
process.  Such  systems  we  call  "protocol  management 
systems;"  in  their  most  complete  form,  they  are 
comprehensive  clinical  trial  management  systems. 

It  is  worthwhile  looking  at  this  type  of 
system  in  greater  detail. 

12 

Williard  has  documented  the  characteristics 
(a  model)  of  a  system  which  would  provide  clinical 
trial  management  capability.  He  proposes  that  the 
procedures  or  protocols  that  make  up  a  clinical 
trial  be  specified  as  a  series  of  tasks  to  be 
initiated  and  executed  by  the  computer  system. 

A  similar  design  for  a  clinical  trial  manage- 

13 

ment  system  has  been  submitted  by  Lee;  it  includes 
a  protocol  specification  procedure  and  the  defini¬ 
tion  of  what  are  to  be  considered  major  and  minor 
deviations  from  the  protocol  (in  an  attempt  to 
characterize  when  an  experiment  is  going  awry) . 

This  system  (under  development  in  1979)  is  to  be 
integrated  with  several  existing  patient  support 
and  other  systems. 

Another  clinical  trial  management  tool  has 
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been  developed  by  Friedman  util  izing minicomputers 
and  the  MIIS  language.  This  system  provides  the 
following  general  functions:  appointment  schedul¬ 
ing,  notice  that  a  patient  has  been  admitted  to 
hospital,  a  laboratory  test  database,  a  protocol 
scheduling  system  and  a  protocol  data  recording 
system  to  capture  data  at  every  stage  in  the  clini¬ 
cal  protocol.  The  trial  manager  can  easily  spot 
missing  data  from  tabular  displays;  the  system  does 
not  do  it.  To  adapt  the  system  for  a  new  trial 
(i.e.  to  specify  a  new  protocol),  a  user  need  only 
enter  the  details  of  this  protocol  in  special 
tables  kept  by  the  system.  Standardized  programs 
access  these  tables  and  drive  the  process. 


Moore  has  described  a  somewhat  similar  task- 
oriented  trial  system.  The  clinical  researcher 
describes  the  experimental  protocol  in  terms  of  a 
series  of  tasks  to  be  carried  out.  Such  tasks 
include:  the  completion  of  data  forms,  the  perfor¬ 

mance  of  procedures,  the  printing  of  documents,  and 
the  manipulation  of  data  to  alter  the  patient’s 
path  in  the  protocol  based  on  individual  response 
to  date.  The  tasks  are  encoded  in  tables,  supported 
again  by  standard  routines.  The  generation  of  a 
new  system  involves  defining  the  tasks,  establish¬ 
ing  the  tables,  and  implementing  some  special  opti¬ 
mized  data-storage  and  editing  programs  for  the 
particular  application.  This  system  has  been 
applied  to  a  post-myocardial  infarction  natural 
history  study  and  to  a  propanalol  intervention 
trial . 

Wirtshaf ter"*  carried  out  an  interesting 
exercise  to  test  whether  such  systems  help  in  the 
clinical  research  process.  His  study  found  compli¬ 
ance  with  the  clinical  protocol  in  94%  of  cases 
where  a  computer  assisted  in  the  management  of  the 
trial ,  but  only  in  64%  of  cases  with  a  standard 
manual  system. 

Work  at  Toronto  General  Hospital  along  these 

lines  by  McAlister1^  has  also  been  done  in  the 
design  and  initiation  of  a  clinical  experiment  to 
test  the  hypothesis  that  a  protocol  management 
system  active  in  the  treatment  of  hypertensive 
patients  results  in  more  patients  under  blood 
pressure  control  than  where  the  protocol  is  not 
enforced  by  a  system.  This  experiment  is  currently 
underway . 

A  very  commonly  cited  system  called 
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P . R .0 .G . E .T . ,  developed  by  Boissel  really  falls 
under  the  category  of  follow-up  management  systems, 
although  it  is  often  listed  with  clinical  trial 
management  systems.  Boissel's  system  makes  sure 
that  encounters  with  the  patient  occur  on  time  and 
that  critical  events  occur;  it  draws  management 
attention  to  deviations  from  the  trial  protocol 
(documented  first  by  the  clinicians) ,  and  assists 
in  randomization  and  stratification. 

An  example  of  a  system  of  a  similar  genre,  but 
not  currently  applied  to  clinical  research  is  the 
HELP  system  developed  by  Gardner  and  Warner  and 

1  9 

documented  by  Cannon.  This  system  is  utilized  to 
ensure  that  the  appropriate  investigation  and 
treatment  protocols  are  carried  out  during  normal 
patient  care.  Again,  the  work  to-date  has 
indicated  that  the  system  has  a  substantial  effect 
on  adherence  to  accepted  patient  care  protocols. 

The  protocol  is  specified  in  terms  of  a  set  of 
rules,  specifying  that  if  certain  conditions  are 
true,  then  certain  outputs  are  produced,  or  inputs 
are  requested.  It  is  clear  that  this  system  could 
be  applied  to  clinical  research.  Similar  comments 
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apply  to  the  PROM IS  system  developed  by  Weed. 

All  of  these  systems  share  weaknesses  inherent 
in  their  conventional  programming  environments  — 
limited  integrity  checking  on  data,  inflexible 
mechanisms  for  dealing  with  exceptions,  and  rela¬ 
tively  unstructured  code  that  is  difficult  to 
modify  or  extend. 

The  most  powerful  single  system  is  ONCOCIN, 
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developed  by  Shortliffe  at  Stanford  University. 
This  system  is  based  on  artificial  intelligence 
research  work  being  carried  out  by  the  author  and 
his  collaborators.  The  clinical  protocols  for 
ONCOCIN  are  specified  in  the  form  of  production 
rules:  if  certain  things  are  true,  then  certain 

conclusions  must  follow  or  processes  must  occur. 
Production  rules  allow  great  flexibility  with 
respect  to  the  handling  of  exceptions  and  for 
altering  the  course  of  therapy  based  on  current 
patient  tolerance. 

There  are  general  problems  with  production 
rules,  however,  while  individually  easy  to  write, 
understand  and  modify,  "on  masse"  they  are  unstruc¬ 
tured  and  not  easy  to  assimilate;  they  also  tend  to 
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develop  inconsistencies  on  modification.  And 
ONCOCIN,  while  handling  the  treatment  protocol  very 
well,  is  not  concerned  with  the  sequencing  and 
timing  of  more  global  trial  events. 

A  further  problem  with  all  of  these  systems  is 
that  although  they  may  perform  well  in  the  environ¬ 
ment  for  which  they  were  developed,  they  may  or  may 
not  be  transplantable  to  other  kinds  of  trials  in 
other  settings.  The  Moore  and  Friedman  systems 
address  this  question  with  their  general  routines 
and  trial-specific  tables.  Taxonomies  offer  a 
clearer,  more  understandable  separation  of  general 
(more  abstract)  trial  features  from  specific  trial 
details,  enabling  easier  and  less  expensive 
transportation . 

Our  System 

The  major  purpose  of  a  clinical  trial  manage¬ 
ment  system  is  to  ensure  val id  trial  results  by 
promoting  adherence  to  the  trial  protocol.  Merging 
this  aim  with  our  research  interest  in  exploring 
the  power  of  taxonomies  in  IIS  design,  we  have 
focussed  our  attention  on  limited  goals  set  in  the 
context  of  the  management  of  a  moderately  complex 
23 

cancer  trial  .  Specifically,  we  are  trying  to 
serve  the  trial  managers  by:  1 )  guiding  the  collec¬ 
tion  of  data  needed  for  analysis  and  monitoring, 
providing  some  range  constraints  on  that  data,  and 
recording  omissions  and  protocol  deviations 
(including  timing)  in  forms  completion;  2)  recom¬ 
mending  individualized  treatments  based  on  the  trial 
protocol  and  each  patient's  changing  condition,  and 
recording  deviations  in  the  nature,  dose  and  timing 
of  treatments  given;  3)  providing  stratification 
and  randomization  facilities;  4)  monitoring  the 
trial  processes  so  as  to  provide  on-going  informa¬ 
tion  to  the  trial  office  on  progress  and  on  signif¬ 
icant,  compromising  deviations.  We  are  not,  at 
present,  implementing  such  useful  (and  important) 
facilities  as  report-generation,  analysis  or 
scheduling. 

Most  current  systems  have  arisen  in  answer  to 
urgent  practical  needs  —  coping  with  the  number 
and  complexity  of  protocols  used  in  cancer  therapy 
(Friedman,  Lee  and  Shortliffe),  easing  the  burden 
of  complete  and  accurate  data  collection  in  heart 
studies  (Moore).  Shortliffe,  however,  has  also 
been  motivated,  like  us,  by  a  desire  to  exercise  a 
programming  tool  --  in  his  case  production  rules, 
in  our  case  TAXIS.  His  system,  backed  by  his 
experience  with  MYCIN,  is  practical  and  usable; 
ours  is  a  rough  and  incomplete  prototype.  But  our 


ultimate  interest  is  not  so  much  in  creating  a 
specific  finished  product  for  a  particular  trial, 
as  in  exploring  our  ability  to  develop,  using 
TAXIS,  a  generic  framework  that  can  be  used  for 
many  different  trial  systems. 

Conclusion 

At  this  time,  the  clinical  trial  management 
program  has  been  written,  and  we  are  analyzing  our 
experience  on  this  project.  The  abstraction  tools 
of  TAXIS  seem  to  have  helped  simplify  the  design 
process,  and  the  resulting  system  has  a  pleasing 
simplicity  and  modularity. 

Once  the  rest  of  the  TAXIS  language  implemen¬ 
tation  has  been  completed,  we  expect  to  test  the 
system  on  some  data  from  the  trial  for  which  it 
was  designed.  A  further  goal  then  will  be  to  try 
to  adapt  it  for  other  trials,  to  see  whether  we  can 
abstract  from  the  result  a  high-level  model  of  a 
general  clinical  trial  management  system  that  will 
be  truly  portable  and  easily  extended. 

Even  now,  the  details  of  our  particular  trial 
appear  in  only  a  few  script  and  transaction 
families.  If  we  can  generalize  them  (and  there  is 
evidence  that  we  can),  generating  a  new  system 
will  only  involve  encoding  the  new  processes  in 
special izations  at  the  bottom  of  the  hierarchical 
framework.  This  would  reduce  the  repetitive  effort 
and  expense  involved  in  building  completely  inde¬ 
pendent  systems  for  every  application. 

We  would  like  to  acknowledge  the  assistance  of 
Ms.  Debbie  Schreiber  in  typing  the  final  manuscript. 
Brian  Nixon  and  Michele  Pilote  for  their CQttlMSnt S  J 
and  the  reviewers  for  their  very  helpful 
suggestions . 
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Abstract 

Taxis  is  a  language  for  information  system  design  and  is  based  on  an  object-oriented 
framework  for  conceptual  modeling  that  draws  on  ideas  from  Artificial  Intelligence.  Its  features 
make  it  appropriate  as  a  tool  for  the  design  of  Office  Automation  systems,  especially  its  facility,  called 
Scripts,  for  modeling  long  term  processes  involving  interaction  and  communication  between  the  system 
and  its  users.  This  paper  reports  on  issues  related  to  the  implementation  of  Scripts.  In  particular,  it 
addresses  the  efficient  management  of  processes  in  the  presence  of  declaratively  specified  assertions 
(integrity  constraints)  that  need  to  be  efficiently  enforced. 
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/  Introduction 

Taxis  is  a  language  for  information  system  design,  developed  at  the  University  of  Toronto. 
Work  on  the  Taxis  Project  has  included  the  initial  design  of  the  Taxis  language  for  describing  the 
semantics  of  the  system’s  data,  transactions,  and  exceptions;  [Mylopoulos,  1980]  [Wong,  1981];  its 
extension,  called  Scripts  for  modeling  long-term  processes  involving  interaction  between  the  system 
and  its  users  [Barron,  1982];  and  facilities  for  the  design  of  (natural  or  artificial)  languages  for 
communication  between  the  user  and  the  system  [Pilote,  1983].  All  of  these  Taxis  language 
components  are  integrated  into  a  single,  uniform  framework  of  classes  and  properties,  whose  structure 
derives  from  concepts  of  semantic  networks  in  Artificial  Intelligence. 

Experience  using  Taxis  includes  its  application  to  hospital  information  systems  [Buchan,  1982] 
[Di  Marco,  1983],  which  seem  to  us  quite  typical  of  office  information  systems.  This  supports  one  of 
our  goals:  to  provide  a  descriptive  formalism  that  is  appropriate  for  modeling  office  systems.  A  second 
goal,  and  the  subject  of  this  paper,  is  that  the  formalism  be  amenable  to  efficient  implementation. 

In  the  work  presented  in  this  paper,  our  main  concern  was  the  efficient  implementation  of 
Scripts.  Two  related  implementation  issues  are  addressed  here  -  namely,  (1)  process  management  and 
(2)  enforcing  semantic  constraints  -  while  other  implementation  issues  are  discussed  in  [Chung,  1984]. 

Scripts  model  long-term  processes  such  as  an  employee,  e.g.  the  steps  an  employee  goes 
through  during  its  lifetime,  such  as  being  hired,  getting  a  pay  raise,  retiring  or  being  fired.  Scripts 
offer  a  Petri-net-like  graphical  formalism,  inspired  by  [Zisman,  1978]  but  incorporating  some 
significant  extensions,  among  them  (a)  the  molding  of  the  scripts  into  the  Taxis  framework  of  classes, 
instances,  and  properties,  and  (b)  the  addition  of  inter-process  communication  primitives  based  on 
Hoare’s  communicating  sequential  processes  [Hoare,  1978]. 
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Scripts  have  two  node  types,  one  type  representing  states  and  the  other  representing 
transitions  from  one  state  to  the  other.  The  description  of  a  state  includes,  among  other  things, 
assertions  that  must  be  true  in  that  state,  while  the  description  of  a  transition  includes,  among  other 
things,  a  list  of  conditions  (Boolean  expressions)  that  must  be  true  for  the  transition  to  fire  and  a  list 
of  actions  (expressions  with  side-effects)  that  are  to  be  executed  in  the  case  that  it  fires.  Expressions 
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refcrencc  data  objects  in  the  underlying  Taxis  semantic  model.  Actions  may  involve  calls  to  Taxis 
transactions  and  the  raising  and  handling  of  exceptions. 

The  first  problem  addressed  in  the  paper  is  the  management  of  long-term  processes.  There 
may  be  a  large  number  of  active  transitions  at  any  time,  (1)  and  it  is  important  to  use  a  strategy  for 
process  scheduling  that  both  '’fair'  and  efficient.  By  efficient  we  mean,  for  example,  that  it  should  not 
be  necessary  to  sift  through  a  large  number  of  transitions  to  find  one  that  can  be  fired. 

An  important  feature  of  Taxis  or  any  other  language  for  modeling  the  office  is  the  ability  to 
use  assertions  to  express  constraints  declaratively.  This  is  the  second  issue  dealt  with  in  the  paper. 
Assertions  have  been  found  very  useful  for  Taxis,  attached  as  invariants  on  states,  global  constraints 
on  a  script,  and  as  invariants  on  data  classes  referred  to  in  the  script.  These  assertions,  however, 
present  a  dilemma,  since  there  is  clearly  a  tradeoff  between  the  generality  of  expressiveness  of  the 
assertions,  on  the  one  hand,  and  the  ability  of  the  implementation  to  efficiently  check  that  none  of 
the  constraints  is  ever  violated,  on  the  other.  Whenever  any  database  transaction  (e.g.  called  from 
within  the  code  for  a  transition)  is  executed,  potentially  all  of  the  constraints  are  put  into  question, 
and  techniques  must  be  employed  to  ensure  they  hold.  In  order  to  strike  a  balance  between 
generality  of  assertions  and  efficiency  of  implementation,  we  have  combined  a  variety  of  optimization 
techniques,  while  also  adopting  the  restrictions  they  have  placed  on  the  assertion  language. 

Section  2  contains  an  explanation  of  some  of  the  details  of  Scripts  in  terms  of  some  examples. 
Section  3  then  discusses  strategy  for  process  management.  Section  4  discusses  issues  of  semantic 
integrity  assertions  and  exception  handling  in  Taxis.  Section  5  describes  the  implementation. 

2  Taxis 

2.1  Preliminaries 

The  Taxis  framework,  is  based  on  three  mutually  orthogonal  abstraction  mechanisms  drawn 
from  the  area  of  Knowledge  Representation  in  Artificial  Intelligence:  aggregation,  classification  and 
generalization/specialization  (Is-A  hierarchy). 

Entities  in  an  enterprise  are  modeled  by  objects ,  which  are  grouped  into  classes  (e.g. 
Employee);  classes  in  turn  are  grouped  in  metaclasses.  An  object  can  have  factual  properties  (e.g. 
john .salary  has  value  $30,000)  in  accordance  with  definitional  properties  of  their  class  (e.g. 
Employee. .salary  is  Positivelnteger).  Property  selection  retrieves  facts  and  definitions. 

(1)  This  is  partly  due  to  the  fact  that,  in  contrast  to  Zisman’s  Augmented  Petri  nets,  script  transitions 
are  not  treated  as  instantaneous  events;  rather,  they  can  be  active  for  long  periods  of  time  (days  or 
weeks),  for  example  because  they  are  waiting  for  communication  or  for  some  condition  to  become 
true. 
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There  are  several  special  versions  of  classes.  The  finitely  defined  classes  are  comparable  to 
standard  programming  language  features:  integer  class,  boolean  class,  enumeration  class  and  string 
class.  The  composite  classes  of  transaction  class,  state  class,  script  class  and  exception  class,  need  to 
be  described  briefly. 

Data  classes  may  be  compared  to  relations.  Their  instances  (member  objects)  may  be  inserted 
3'T‘H  deleted.  Property  categories  impose  structure  on  properties.  For  example,  data  classes  have 
at  ribute  properties,  which  can  be  modififed,  and  characteristic  properties,  which  cannot. 

Transactions  describe  the  way  in  which  the  database  changes  and  model  short-term  processes. 
By  short-term,  we  mean  a  transient  event  such  as  calculating  the  average  salary  of  Employee  class. 
For  this  reason,  transactions  are  treated  as  atomic  units  of  execution  in  Taxis. 

Exceptions  are  raised  not  only  when  a  precondition  or  a  postcondition  is  violated  before  or 
after  the  execution  of  of  a  transaction  but  when  a  postcondition  of  a  transition  is  violated.  When  an 
offending  transaction  or  transition  raises  an  exception,  the  control  is  transferred  to  an  exception 
handler  which  is  specified  by  the  caller. 

Descriptions  of  data  classes,  state  classes  and  script  classes  are  centered  around  those  for  the 
semantic  integrity  assertions.  Objects  that  are  instances  of  a  class  are  governed  under  the  assertions 
of  that  class. 

The  classes  that  can  be  equipped  with  assertions  for  their  instances  are  data  classes,  state 
classes  and  script  classes. 

22  Assertions  on  Data  Classes 


Properties  of  data  classes  can  be  associated  with  assertions.  For  example: 
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dataClass  Employee  with 
character i st i cs 

emp I oyeeNumber :  Post t ivelnteger ; 
ret i rementDate :  $Time; 
emp loymentDate :  $Time 
at  t  r ibutePropert i es 
dept:  Department; 
salary:  Pos i t ive Integer ; 
status :  { I  dead,  alive  I } 
assert  ions 

sa I  Increas i ng :  salary. $old<  salary. $new 

exception  SalDecreasing  (salary,  ...) 
sa I  Dept Bounded :  salary  <=  (dept . budget / 10)  exception  ... 
sa IMaxBounded :  salary  <  100000  exception  ...; 
ret i reBeforeDate :  ret i rementDate  before  $now  exception.. 

keys 

emp loyeeKey :  (emp loyeeNimber ) 
endDataClass 

This  is  the  declaration  of  the  Employee  class  with  four  assertions: 

Sallncreasing  is  a  dynamic  assertion  that  the  salary  of  each  employee  must  be  increasing. 

SalDeptBounded  requires  that  no  employee  can  earn  more  than  one  tenth  of  his  department's 
budget. 

SalMaxBounded  requires  that  no  employee  can  earn  more  than  $100,000. 

RetireBoforeDate  requires  that  no  employee  work  after  his  retirement  date.  $now  is  the  most 
recent  instance  of  time  with  the  system  clock. 


Note  that  when  a  property  value  is  concerned  with  an  assertion  through  a  property  selection, 
more  than  one  classes  are  governed  under  one  assertion.  The  third  assertion  SalDeptBounded  is  such 
that  the  salary  of  an  employee  and  the  budget  of  a  department  shares  the  same  assertion.  Whenever 
the  salary  of  an  employee  or  the  department  of  an  employee  or  the  budget  of  a  department  is 
modified,  this  assertion  has  to  be  checked  against  the  modification. 

[Di  Marco,  1982]  introduces  assertions  on  data  classes  and  shows  a  collection  of  examples  on 
data  classes. 

23  State  classes 


Abstraction  principles  of  Taxis  proved  equally  powerful  and  are  applied  to  states  of  scripts 
[Buchan,  1982],  [Di  Marco,  1982]  and  they  are  now  instances  of  state  classes. 

Being  treated  similarly  to  data  classes,  state  classes  can  have  assertions  associated  with  either 
characteristic  properties  or  attribute  properties  or  both  at  the  same  time.  Assertions  on  state  classes 
are  useful,  in  particular,  for  guaranteeing  successful  firing  of  outgoing  transitions.  For  example: 
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stateClass  Emp loyeeReadyToHi r©  isA  Ini t iaIState  wi th 
character i st ics 

empReadyToHi re :  Employee 
assert  ions 

empAlive:  empReadyToHi re. status  =  alive  exception  ... 

keys 

empReadyToHi rekey :  (empReadyToHi re) 
endStateClass 

The  assertion  employeeAlive  requires  that  the  employee  be  alive  while  any  outgoing  transition  of  an 
instance  of  Employee  Re  adyToHire  is  enabled  or  firing.  Firing  takes  some  time  to  complete;  during 
this  time  period,  it  may  no  longer  be  valid  to  run  some  transition;  it  is  therefore  necessary  to  declare 
ippropriate  preconditions  on  transitions. 

[Buchan,  1982]  introduces  assertions  on  state  classes  and  shows  a  collection  of  examples  of 

them. 

2.4  Script  classes 

Scripts  operate  on  objects  of  data  classes,  state  classes,  and  finitely  defined  classes.  Among 
others,  assertions  of  script  classes  are  useful  in  the  following  ways: 

Mutually  dependent  parameters,  locals  and  states  can  be  inter-related. 

Domain  constraints  for  locals  that  are  instances  of  finitely  defined  classes  can  be  conveniently 
enforced. 

The  domain  of  properties  of  data  classes  can  be  further  restricted  according  to  the  needs  of 
individual  script  classes. 

When  one  assertion  applies  commonly  to  several  states  of  a  script,  it  can  be  attached  to  a  script 
class  without  asserting  all  concerned  states. 

Suppose  we  have  a  script  class  named  ManageMaintenanceEmployee  each  instance  of  which  is  to 
be  responsible  to  maintain  the  salary  of  an  employee  working  in  the  maintenance  department  until  it 
retires  the  employee.  Also  suppose  that  the  salary  of  a  maintenance  employee  is  increased  by  some 
amount  each  year.  Further  assume  that  the  procedures  to  retire  an  employee  takes  place  one  day 
before  the  retirement  date  set  for  one.  The  augmented  Petri-net  (see  first  diagram  in  this  paper)  is 
depicted  with  three  states  of  eReadyToHire,  eWorking  and  eRetired;  three  transitions  of  hire, 
increaseSalary  and  retire  are  also  shown.  Here  are  the  corresponding  Taxis  declarations: 
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scriptClass  ManageMa i ntenanceEmp loyee(e :  Brployee)  with 
locals 

t imeTo IncreaseSa lary :  $Time; 
tenure:  Pos i t i ve Integer ; 
dateToProcessRet i rement :  $Time; 
errploymentOate:  $Time 
states 

eReadyToHi re :  Eryp  I oy ©eReadyToHi  re[empRoadyToHi  re :  e] ; 
^Working:  Nonlni t iaIState; 
eRetired:  Non  Ini t iaIState 
t  ransi t ions 
h i  re 

from  eReadyToHi re 
to  dWorking 
actions 

al:  t imeTo IncreaseSa lary  <- 

r  set  time  to  increase  salary  */; 
a 2:  dateToProcessRet i rement  < -  e. ret i remnetdate ; 
a3:  dateToProcessRet i rement .day  <- 

dateToProcessRet i rement .day  +  1; 
a4:  emp loymentDate  <-  e . emp loymentDate 
•  ■  • 

increaseSalary 
from  forking 
to  eWorking 
conditions 

cl:  $now  after  t imeTo IncreaseSalary 
act  ions 

al:  tenure  <-  $now.year  -  e. employment .year ; 
a2:  e. salary  <-  /*  assign  an  increased  salary  */ 
a3:  t imeTo IncreaseSalary .year  <- 

t imeTo IncreaseSa I  ary .year  +  1 

•  •  t 

ret i re 

f  rom  dWork  i  ng 
t  o  ©Re tired 
cond i t i ons 

cl:  $now  after  dateToProcessRet i rement 
act  ions 

•  •  • 

assert  ions 

axl:  dateToProcessRet i rement  after  emp  loymentDate 

except  ion  . . . ; 

ax2:  e. dept. name  =  maintenance  exception  ...; 
ax3:  e. salary  >  18000  or  tenure  <  3  except  ion  ...; 

MaxSa IForMa  i ntenanceEmp  :  e. salary  <  90000  exception  ... 
endScr iptClass 

Four  assertions  are  included  in  the  definition  of  a  script  class  ManageMaintenanceEmployee : 


axl  is  an  assertion  that  the  date  to  process  retirement  for  a  maintenance  employee  be  after  that 
of  one’s  employment. 

ax2  requires  that  the  name  of  the  department  for  the  employee  used  in  creation  of  an  instance  of 
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M anageM aintenanceEmployee  be  maintenance. 

ax3  requires  that  everyone  in  maintenance  department  must  make  more  than  $18,000  if  his  length 
of  stay  within  the  company  is  over  three  years. 

MaxSalForMaintenanceEmp  requires  that  everyone  in  maintenance  department  must  make  less 
than  $90,000.  Note  that  generally  an  employee  can  not  make  more  than  $100,000  by  the  assertion 
in  the  data  class  Employee.  Therefore,  ax4  is  a  further  restriction  on  the  domain  of  an  employee 
in  the  maintenance  department. 

3  Process  Management 

In  a  Taxis  system,  there  can  be  any  number  of  instances  of  script  classes  where  each  script 
instance,  in  turn,  can  have  any  number  of  transitions  some  of  which  can  run  concurrently.  A  script  is 
alive  only  when  at  least  one  of  its  transitions  is  active  [Barron,  1982];  otherwise  the  script  is  removed. 
Hence  the  management  of  long-term  procedures  reduces  to  that  of  transitions  and  is  discussed  in  this 
section. 


Two  requirements  are  to  be  met  to  fire  a  transition.  First  of  all,  all  its  input  states  must  be 
activated  and  secondly  all  its  conditions  need  to  be  satisfied.  Once  these  two  requirements  are  met,  a 
transition  executes  its  actions;  if  its  postconditions  are  then  true,  all  of  its  input  states  are  deactivated 
and  all  of  its  output  states  are  activated.  One  point  to  add  is  that  execution  of  actions  may  be 
interrupted  due  to  synchronization  of  inter-script  communication  and  may  be  resumed  upon  receipt  of 
a  signal  from  another  script. 

To  clarify  what  a  transition  goes  through  to  fire,  we  introduce  the  possible  status  values: 


I  I 

disabled  ->  enabled  ->  fireable  ->  running  ->  pending  ->  runnable 


l 


l 


-<-  - 


-  -  -< 


. -< . 


. < . 

A  disabled  transition  is  one  whose  input  states  are  not  all  active  and  an  enabled  transitions  is  one 
whose  input  states  are  all  active  and  conditions  are  not  all  satisfied.  A  fireable  transition  is  an 
enabled  one  with  all  conditions  satisfied  and  a  running  transition  is  one  that  is  executing  actions  or 
testing  postconditions  or  after  all  these,  deactivating  all  of  its  input  states  and  activating  all  of  its 
output  states.  A  transition  is  pending  if  it  has  issued  a  synchronization  command  and  is  waiting  for  an 
appropriate  transition  to  signal  so  that  it  becomes  a  runnable  transition.  It  is  straightforward  to 
provide  fair  chances  to  running,  pending  and  runnable  transitions;  details  are  given  in  [Chung,  1984]. 
For  the  other  transitions  (with  status  disabled,  enabled  or  fireable ),  we  describe  briefly  a  strategy 
proposed  by  [Zisman,  1978]  and  propose  another  one  we  use  with  some  discussions  on  the  relative 
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merits  and  disadvantages. 

3.1  Cyclic  Checking 

In  this  strategy  proposed  by  [Zisman,  1978],  an  active  transition  set  is  maintained  whose 
members  are  enabled  transitions,  where  successive  scans  are  made  sequentially  and  cyclically  from  the 
first  member  so  as  to  find  one  tansition  whose  conditions  are  satisfied.  Once  the  selected  one  fires, 
the  scan  continues  from  where  the  previous  one  left  off  (2).  This  strategy  has  several  merits: 

it  guarantees  fairness  among  transitions  and  avoids  indefinite  postponement. 

it  is  simple  to  understand  and  easy  to  implement. 

it  has  satisfactory  performance. 

However,  performance  degrades  when  the  ratio  of  the  number  of  enabled  transitions  to  that  of 
fireable  ones  is  low.  Suppose  the  number  of  enabled  transitions  is  Ne  and  that  of  fireable  ones  Nf  and 
let  Ref  be  the  ratio  of  Ne  to  Nf.  The  best  situation  for  this  strategy  occurs,  then,  when  the  value  of 
Ref  is  always  unity  indicating  that  all  the  members  of  the  active  transition  set  are  actually  fireable. 
The  worst  case  is  when  Ref  approaches  infinity  indicating  that  no  member  is  actually  fireable  or  when 
Ref  equals  Ne  indicating  that  the  last  member  only  is  always  fireable  so  that  the  number  of  probes  to 
be  made  to  find  one  fireable  transition  is  Ne.  When  Ref  is  small,  the  perfomance  of  this  strategy  is 
favorable  but  when  Ref  gets  larger,  the  performance  will  be  degraded. 

We  expect  Ne  to  be  large  in  the  magnitude  of  hundreds  or  even  thousands  in  a  real 
environment  and  Nf  much  smaller  than  Ne  with  the  value  of  Ref  high.  Therefore,  we  propose  an 
alternative  strategy  in  the  following  section. 

32  Trigger  Mechanism 

If  we  can  maintain  only  those  fireable  transitions,  then  we  can  be  relieved  of  the  Ref  problem. 
Let’s  introduce  a  queue  whose  entries  consist  of  fireable  transitions  and  call  it  a  process  queue.  Then, 
the  (first-in,  first-out)  process  queue  can  be  maintained  by  treating  the  conditions  of  transitions  as  the 
triggering  conditions  [Eswaran,  1976],  i.e.,  a  transition  is  placed  in  the  process  queue  the  moment  all 
conditions  are  found  to  be  true  with  all  input  states  active. 

The  merits  of  this  strategy  are: 

it  guarantees  fairness  among  transitions  and  avoids  indefinite  postponement  as  cyclic  checking 
does. 

it  relieves  the  Ref  problem. 

(2)  Assume  that  at  each  probe,  a  transition  whose  input  states  are  deactivated  will  be  taken  out  of  the 
active  transition  set. 
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The  disadvantages  are: 

cost  of  implementing  triggers  is,  generally,  high. 

frequent  changes  of  status  for  transitions  within  the  process  queue  cause  frequent  invocations  of 
the  trigger  mechanism  once  implemented. 

The  first  disadvantage  can  be  lessened  significantly  with  the  techniques  used  for  efficient  detection  of 
assertion  failures  to  be  described  in  Section  5,  but  the  second  one  needs  a  bit  more  discussion.  In 
cyclic  checking,  each  enabled  transition  is  examined  only  once  during  each  scan  of  the  active  transition 
set,  and  is  tested  for  being  fireable.  With  the  trigger  mechanism,  while  a  fireable  transition  ,  say  Tf, 
within  the  process  queue  is  waiting  for  its  turn  to  fire,  other  running  transitions  may  modify  the 
object(s)  used  for  Tf' s  condition  expressions  in  such  a  way  that  Tf  becomes  enabled  and  has  to  be 
taken  out  of  the  process  queue.  For  this,  the  triggering  mechanism  has  to  be  invoked.  If  Tf  becomes 
fireable  again,  it  has  to  be  put  into  the  process  queue  again  with  yet  another  invocation  of  the  trigger 
mechanism.  If  the  change  of  status  of  this  sort  is  frequent  for  Tf,  each  time  the  change  is  made,  the 
trigger  mechanism  has  to  be  invoked  possibly  without  actually  firing  Tf.  This  situation  was  not 
present  for  cyclic  checking.  Thus,  the  performance  of  using  trigger  mechanism  would  be  degraded  if 
the  system  behaves  in  such  a  way  that  the  kind  of  status  change  just  described  is  frequent.  But 
examining  the  behavioral  structure  of  scripts  and  their  transitions  carefully,  we  see  that  the  condition 
expressions  for  transitions  consist  mostly  of  predicates  used  for  inter-script  communication  (or  for 
user- I/O)  so  that  they  do  not  cause  the  kind  of  situation  above.  Even  when  objects  are  used  apart 
from  the  predicates  for  condition  expressions  of  one  transition  within  a  script,  they  are  not  often 
dependent  on  the  operations  of  transitions  of  other  scripts.  This  tendency  of  localization  of  objects 
with  respect  to  condition  expressions  implies  that  the  amount  of  work  involoved  when  the  trigger 
mechanism  is  invoked  is  small  with  the  consequence  that  the  second  disadvantage  is  not  of 
significance. 

What  remains  then  is  the  techniques  for  efficient  implementation  of  the  trigger  mechanism 
and  they  are  described  in  section  5. 

4  Semantic  Integrity 

4.1  Introduction 

We  view  a  schema  of  Taxis  system  as  made  both  of  a  logical  data  structure  and  of  semantic 
integrity  assertions  [Tsichritzis,  1982].  Semantic  integrity  of  a  database  refers  to  what  the  database  is 
meant  to  hold.  The  semantic  integrity  defines  the  legality  of  the  data  values  and  whether  the  data 
values  constitute  a  meaningful  set  for  a  model  of  reality,  and  it  is  expressed  by  assertions 
declaratively.  Assertion  failures  are  handled  uniformly  with  the  existing  exception  handling 
mechanism  and  efficient  automatic  detection  of  them  is  achieved  through  optimization  techniques. 
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42  Expressions  for  Assertions 

Objects  that  are  instances  of  a  class  are  governed  by  the  assertions  of  that  class.  The  classes 
that  can  be  equipped  with  assertions  for  their  instances  are,  as  illustrated  in  the  previous  example, 
data  classes  [Di  Marco,  1983],  state  classes  [Buchan,  1982]  and  script  classes.  In  this  section,  three 
types  of  expression  that  can  be  used  as  assertion  are  described. 

Intra  Class  Assertion 


Assertions  that  are  associated  with  properties  of  instances  of  one  class  are  called  intra  class 
assertions : 


sal  Increasing  :  salary. Sold  <  salary. $new 
salMaxBound  :  salary  <  100000 


Inter  Class  Assertion 


One  of  the  powerful  tools  provided  by  Taxis  is  the  property  selection  operator.  Assertions 
that  are  shared  by  the  properties  of  instances  of  not  only  one  class  but  another  through  property 
selection  are  termed  inter  class  assertions : 


salDeptBound  :  salary  <=  (dept .budget / 10) 
emp loyeeAl i ve  :  emp. status  =  alive 
MaxSalForMaintenanceBmp  :  e. sal  ary  <  80000 

These  assertions  relate,  respectively,  a  data  class  to  another  data  class,  a  data  class  to  a  state  class,  and 
a  data  class  to  a  script  class. 


Temporal  Assertion 


It  would  be  hard  to  attain  assertions  such  as  "An  object  can  exist  in  the  database  between  one 
instance  of  time  and  another”  when  there  is  no  modeling  tool  for  long-term  processes.  Having  scripts, 
we  have  no  difficulty  in  attaining  such  assertions  with  temporal  ones: 

ret  I reBeforeDate  :  ret  I rementDate  before  $now 

43  Exceptions  and  Exception  Handlers 

Each  assertion  is  associated  with  an  exception  so  that  when  an  assertion  fails,  a  designated 
exception  is  raised  so  as  to  be  handled  by  an  appropriate  exception  handler.  Deferring  the  description 
on  temporal  assertions,  we  first  describe  exceptions  raised  due  to  failures  of  either  intra-class 
assertions  or  inter-class  assertions. 
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Exceptions  may  be  raised  cither  at  the  beginning  or  end  of  transactions.  The  latter  way 
enables  us  to  have  assertion  transactions  (which  allow  temporarily  inconsistent  states)  [Eswaran,  1976] 
whereas  the  former  doesn’t.  Despite  this,  we  take  an  approach  of  raising  exceptions  at  the  beginning 
of  transactions  for  two  reasons.  1.  We  would  not  be  sure  when  the  system  is  consistent  with  the 
assertions  when  handlers  need  interaction  with  users  and  exceptions  take  some  time  before  they  are 
actually  removed.  2.  If  an  exception  is  raised  at  the  end,  then  the  exception  handler  must  undo  the 
effects  of  the  current  transaction;  but  this  introduces  the  danger  of  the  handler  itself  raising  an 
exception. 

In  Taxis,  raising  an  exception  is  determined  at  the  level  of  statement:  insertion  or 
modification,  hence  similar  to  INGRES  [Stonebraker,  1976].  Removal  can  not  cause  an  assertion 
failure  due  to  the  constraint  that  an  object  to  be  deleted  must  not  be  referenced  by  any  other  object 
(if  not,  the  operation  is  not  carried  out  [Nixon,  1983]),  and  due  to  the  restrictions  we  impose  currently 
on  the  types  of  assertion  expressions.  Retrieval  does  not  cause  an  assertion  failure.  Since  both 
database  operations  of  insertion  and  modification  are  treated  as  ordinary  transactions,  assertions  are 
treated  as  preconditions  of  these  transactions.  Then  exception  handlers  for  failures  of  assertions  are 
specified  at  the  time  these  transactions  are  called;  thus  exception  handling  for  assertion  failures 
uniformly  fits  into  the  framework  of  exceptions  raised  by  called  transactions  and  handlers  specified  by 
caller  transactions  [Wasserman,  1977],  [Goodenough,  1975]  or  by  caller  scripts. 

Treating  intra-class  assertions  and  inter-class  assertions  as  preconditions  for  insertion  and 
modification  is  secure  in  the  sense  that  the  system  will  always  be  consisitent  regarding  them.  How 
about  temporal  assertions?  If  a  temporal  assertion  is  detected  as  already  having  failed  (e.g.,  an 
empolyee  is  promoted  after  his  expected  retirement  date)  the  system  will  be  inconsistent  and  the 
exception  handler  faces  the  ordeal  of  restoring  consistency.  Unless  an  exception  is  raised  some  time 
before  the  actually  designated  time,  the  system  will  fall  into  an  inconsistent  state.  For  this  reason,  an 
exception  for  the  failure  of  a  temporal  assertion  is  raised  some  time  before  the  actually  specified  time 
within  one  so  that  the  time  is  long  enough  for  an  exception  handler  to  remove  the  exception  condition 
(3). 

5  Implementation 

5.1  Linking  conditions  to  assertions 

Semantic  integrity  assertions  for  data  classes,  state  classes  and  script  classes  are  very  similar  to 
conditions  for  transitions.  Expressions  allowed  for  assertions  are  the  same  as  those  for  conditions 
except  that  condition  expressions  comprise  predicates  for  inter-script  communication  or  for  user-I/O  as 
well.  Assertion  failures  are  detected  automatically  and  raise  exceptions  while  conditions  evaluating  to 
true  or  false  may  trigger  transitions  with  their  status  changes  from  enabled  to  fireable  or  fireable  to 


(3)  The  amount  of  time  for  this  purpose  is  passed  as  a  parameter  for  Taxis  compiler.  If  not  passed, 
then  the  compiler  imposes  a  default  value. 


-85- 


enabled  respectively. 

Treating  assertions  similarly  to  conditions  enables  us  to  consider  only  one  of  these  two  in 
describing  the  techniques  for  efficient  implementation  of  trigger  mechanisms  and  automatic  detection 
of  assertion  failures.  Here  we  describe  the  enforcement  of  assertions  here  (4). 

52  Optimization  Techniques 

The  cost  of  allowing  declarative  specifications  of  assertions  is,  in  general,  high.  Two  major 
cost  components  are  [Lafue,  1982] : 

selection  of  assertions  to  check  after  a  database  update  in  the  first  phase 

checking  of  assertions  in  jeopardy  in  the  second  phase  accessing  all  the  data  that  are  concerned 
with  the  related  assertions. 

To  reduce  these  costs,  we  adapt: 

an  assertion  processor  with  compile  time  optimization  [Sarin,  1977] 
optimization  by  query  modification  [Stonebraker,  1975] 
use  of  redundant  data  [Bernstein,  1980] 

to  Taxis  for  efficient  implementation. 

Since  the  Taxis  data  model  is  based  on  semantic  networks,  the  techniques  need  to  be  adopted 
accordingly.  The  modification  of  an  object  at  one  node  in  the  network  may  affect  a  number  of 
objects  at  various  nodes.  Suppose  an  expression  ax  that  is  a  subexpression  of  assertion  AX  involves  the 
following  factual  property  selection: 

P1P2.  ...  PiPi+1.  ...  Pn,  where  each  Pi  is  a  property  of  a  class  Ci  so  that  the  value  of  Pi  is  an 
instance  of  a  class  Ci+7,  for  7<  =i<  =n.  We  mean  necessarily  that  AX  is  specified  in  terms  of  PI 
of  Cl  within  Cl  and  the  value  for  the  last  Pn  of  class  Cn  is  an  instance  of  a  finitely  defined  class: 
integer,  boolean,  string  or  enumeration.  In  another  words,  C1.P1P2.  ...  Pn  =  Cn+1  and  Ci.Pi  = 
Ci+7,  for  7<  =i<  =n  and  Cn+7  is  a  finitely  defined  class. 

The  following  may  help  one  understand  more  clearly  what  has  been  said: 


(4)  The  method  of  finding  predicates  for  conditions  that  evaluate  to  true  is  omitted  here  but  can  be 
found  in  [Chung,  1984]. 
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endDataClass 


dataClass  Ci  with 

at  t  r ibutePropet i es 


Pi :  Ci +1 


dataClass  Ci+1  with 

at  t  r i but ©Proper t i es 
Pi +1 :  Ci +2 
endDataCI ass 


for  1  <=  i  <=  n- 1 


If  we  assume  further  that  an  object  Ok(Ci)  is  an  instance  of  Ci  and  objects  Ok(Ci+l)  and  Ok+l(Ci+l) 
are  instances  of  Ci+1  for  some  k,  then  whenever  the  value  of  Pi  for  Ok(Ci)  is  modified  from  Ok(Ci+l) 
to  Ok +1  (Ci+1),  the  value  of  P1P2.  ...  PiPi+1.  ...  Pn  will  be  for  all  instances  of  Cl  whose 
PlP2..PiPi+l..Pn  value  is  Ok(Ci). 

The  argument  that  has  just  been  made  is  the  core  for  efficient  implementation  of  semantic 
integrity  assertions  of  Taxis  in  the  following  ways: 

The  property  Pi  of  a  class  Ci  must  be  attached  with  the  assertion  AX  to  select  only  relevant 
assertions  when  an  instance  of  Ci  is  modified,  for  all  1<  =i<  =n. 

Even  though  an  instance  Ok(Ci)  of  Ci  is  modified  with  AX  having  been  attached  to  Pi  of  Ci,  only 
those  instances  of  Cl  whose  P1P2.  ...  Pi-1  value  is  Ok(Ci)  have  to  be  validated  against  AXy  for 
1<  =i<  =n. 

When  the  comparison  between  the  value  before  the  modification  and  that  after  the  modification 
may  eliminate  the  iteration  completely  over  instances  of  Cl,  it  must  be  done  in  terms  of  Ok(Ci), 
i.e.,  Ok(CiUPiSold).  ...  Pn  and  Ok(Ci)J(PiJ$new).  ...  Pn,  for  all  1<  ~i<  =n. 

The  first  point  is  to  reduce  one  of  the  first  two  major  cost  components,  i.e.,  selection  of  assertions  to 
check  after  a  database  update  in  the  first  phase.  It  is  possible  to  select  only  relevant  assertions 
applicable  to  a  database  update  with  the  first  point.  Not  only  that  assertions  are  treated  as 
preconditions  of  insertions  and  modifications  mentioned  previously  but  that  the  second  point  just 
described  is  yet  another  qualifier  is  related  to  the  technique  of  optimization  by  query  modification 
[Stonebraker,  1975].  The  third  point  is  required  to  adopt  the  techniques  of  assertion  processor  [Sarin, 
1977]  as  well  as  those  using  maximum  and  minimum  as  redundant  data  [Bernstein,  1980].  Detailed 
description  and  illustrations  on  the  method  can  be  found  in  [Chung,  1984]. 

53  Some  Restrictions  on  Expressions 

Those  expressions  for  which  the  optimization  techniques  described  in  section  52.  can  not  be 
applied  and  those  for  which  the  techniques  can  be  applied  but  causes  a  high  cost  at  run  time  are  not 
allowed  for  current  Taxis. 
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Property  Selection 

A  property  selection  needs  to  be  done  through  a  characteristic  property  or  an  attribute  property  for 
state  classes  and  data  classes,  and  through  a  local  property  or  a  parameter  property  for  script  classes. 

Aggregate  Assertions 

Assertions  involving  transaction  calls  for  statistical  values  such  as  maximum ,  minimum  or  average  or  for 
other  general  purpose  are  not  allowed.  Deletion  constraint  may  be  violated  as  noted  in  [Stonebraker, 
1975]  for  assertions  with  transaction  calls  for  statistical  values  even  though  the  optimization  techniques 
can  be  applied  for  them.  Assertions  with  transaction  calls  for  other  than  statistical  values  can  not 
adopt  the  techniques  generally. 

Temporal  Assertions 

Operators  allowed  for  Snow  are  before  and  after  only  and  the  opposite  operand  for  Snow  ought  to  be 
an  instance  of  time.  We  restrict  further  that  neither  arithmetic  nor  property  selection  be  used  for  a 
temporal  assertion.  Some  of  the  major  reasons  are: 

since  the  execution  time  of  a  transaction,  which  is  treated  as  the  atomic  unit  of  execution,  can 
not  be  guessed  in  advance  and  is  usually  dependent  upon  the  load  of  the  environment  computer 
system,  neither  the  equality  operator  nor  the  inequality  operator  is  allowed. 

the  time  when  a  temporal  assertion  is  to  be  checked  can  not  be  be  determined,  generally,  if 
arithmetic  is  involved. 

since  temporal  assertions  are  checked  some  time  before  the  actually  designated  time,  it  is 
important  to  minimize  the  time  used  to  generate  the  checking  time.  But  property  selection  may 
cause  the  iteration  over  some  classes  and  increase  the  time  for  generating  the  checking  time. 

Nothing 

One  of  the  main  reasons  why  nothing  (a  special  null  value  in  Taxis)  is  not  allowed  within  an  assertion 
expression  is  because  if  one  involves  boolean  operators  of  and  or  or  at  the  same  time  with  nothing, 
neither  compile-time  optimization  nor  run-time  optimization  regarding  the  boolean  operators  is 
possible.  Also  the  semantics  of  such  expressions  are  not  clear. 

Sold  and  $new 

These  need  be  used  as  symmetric  dual  to  avoid  run  time  overhead.  The  use  other  than  the  form 
required  seems  infrequent  for  an  assertion  expression  as  well. 

Multiple  Purturbations 

This  term  comes  directly  from  [Sarin,  1977],  where  he  explains  that  such  assertion  as  *  an  employee’s 
salary  must  be  less  than  his  manager’s  salary  who  is  again  an  employee*  generates  2**n  assertions, 
where  n  is  the  number  of  repetitions  of  the  same  class.  For  Taxis,  those  assertions  that  have  property 
selections  involving  objects  of  the  same  class  as  the  origin  where  the  assertion  are  specified  will  fall 
into  this  category.  Other  than  the  need  to  create  artificial  classes  for  this  kind  of  assertions  for  Taxis, 


-  88  - 


it  is  not  clear  what  an  assertion  means  when  both  sides  deal  with  the  same  object  [Sarin,  1977].  For 
these  reasons,  property  selections  ought  not  to  involve  the  same  class  where  they  are  specified. 

6  Summary 

Semantic  integrity  assertions  have  been  presented  along  with  some  of  the  new  features  of 
Taxis  concerned  with  them,  where  semantic  integrity  assertions  are  specified  declaratively  and 
maintained  automatically.  Efficient  techniques  for  enforcing  them  are  described  with  the  discussion 
on  some  restrictions  on  expressions  necessary  to  achieve  the  efficiency.  Associating  the  trigger 
mchanism  for  transitions  with  the  way  assertion  failures  are  detected  is  described  to  attain  efficient 
management  of  long-term  processes  where  the  techniques  for  enforcing  assertions  efficiently  are 
applied  to  trigger  mechanism. 
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Abstract 

For  the  most  part,  the  process  of  building 
information  systems  has  been  viewed  as  problem¬ 
atic  and  inefficient  from  end  user,  management, 
and  engineering  standpoints.  The  need  for 
greatly  improved  approaches  to  information 
systems  development  is  quickly  becoming 
apparent  throughout  the  data  processing 
industry.  The  TAXIS  project  at  the  University 
of  Toronto  promotes  the  idea  of  applying  a 
software  development  methodology,  we  call 
"concept  specialization,"  to  the  life  cycle 
of  an  information  system.  The  development  of 
information  systems  presents  particular 
problems  which  are  different  from  other  soft¬ 
ware  disciplines  such  as  scientific  software 
or  system  software.  Concept  specialization, 
augmented  with  appropriate  automated  tools, 
intentionally  addresses  the  problems  presented 
by  information  systems  development.  This 
paper  outlines  some  of  the  concepts  behind  the 
TAXIS  system  and  describes  an  integrated  inter¬ 
active  design  environment  that  assists  the 
designer  during  the  conceptual  modeling  phase. 
The  design  environment,  called  TAXIED,  is 
implemented  on  a  VAX  780  under  the  UNIX 
operating  system. 


1.  Introduction 

Most  information  systems  tend  to  be 
extremely  complex  and,  unfortunately,  the 
process  of  building  information  systems  has  been 
viewed  as  problematic  and  inefficient  from 
engineering,  management  and  end  user  standpoints. 
A  major  portion  of  the  difficulty  in  developing 
information  systems  can  be  attributed  to  the 
lack  of  a  systematic  approach  to  the  specific¬ 
ation,  design  and  implementation  [10].  The 
TAXIS  project  at  the  University  of  Toronto 
addresses  the  problems  involved  with  information 
systems  development  and  proposes  a  comprehensive 
approach  that  can  potentially  lead  to  substantial 
improvements. 

Conventional  approaches  for  developing 
information  systems  are  proving  to  be  inadequate 
in  many  ways.  Information  system  developers 
have  been  plagued  with  productivity  problems, 
schedule  slippages,  cost  overruns,  and  quality 
and  reliability  inefficiencies  with  resulting 


systems.  The  development  of  information  systems 
entails  a  life  cycle  which  is  analogous  to  the 
life  cycle  concept  used  in  manufacturing  environ¬ 
ments:  where  a  product  is  first  conceived, 
specified  in  detail,  designed,  and  then  built  and 
maintained  until  it  is  no  longer  usable.  The 
information  systems  life  cycle  process  is  impeded 
by  several  factors  including  the  lack  of  con¬ 
sistent  methodologies,  poor  integration  between 
sthges ,  and  the  absence  of  automated  tools  to 
support  the  activities  involved  at  each  step. 
Moreover,  design  techniques  that  are  currently 
used  do  not  address  the  complexities  involved  in 
building  information  systems. 

The  TAXIS  system  attempts  to  improve  the 
life  cycle  process  in  several  ways  including: 
eliminating  stages,  improving  the  vertical 
integration  throughout  the  life  cycle,  and 
providing  automated  tools  to  assist  the  designer 
and  developers  of  application  systems.  The 
TAXIS  system  concentrates  on  the  requirements 
specification  and  conceptual  design  stages  of  the 
information  systems  development  life  cycle. 

After  the  conceptual  model  has  been  described, 
the  system  generates  the  data  base  and  programs 
which  form  the  application's  production  system. 
The  requirement  specification  language  [5]  and 
‘ the  conceptual  design  language  [12]  [1]  use  the 
same  underlying  abstraction  principles.  We  see 
both  these  stages  of  the  life  cycle  having 
complementary  design  environments  that  facilitate 
and  emphasize  the  activities  involved  at  each 
stage.  The  goal  of  the  TAXIS  project  is  to 
provide  a  comprehensive  information  systems 
development  environment  that  covers  all  aspects 
of  the  development  life  cycle. 

By  providing  a  systematic  approach  to 
information  systems  development,  the  TAXIS  system 
offers  major  improvements  for  the  life  cycle 
process.  TAXIS  simplifies  the  time  consuming 
logical  and  physical  data  base  design,  coding, 
and  testing  phases  by  automatically  generating 
application  systems  from  the  conceptual  model. 

The  generated  appplication  system  can  be  used 
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as  is  for  the  production  system  or  it  can  be 
further  modified  if  efficiency  is  a  major  concern  . 
Both  the  requirements  specification  language  and 
the  conceptual  modeling  language  are  based  on  the 
same  design  methodology  which  simplifies  the  tran¬ 
sition  between  the  phases.  The  design  methodology, 
called  concept  specialization,  is  intentionally 
directed  towards  the  complexity  problems  encount¬ 
ered  by  information  systems  designers.  We  have 
completed  development  of  a  comprehensive  design 
environment  for  the  conceptual  design  stage  [9] , 
which  consists  of  an  integrated  set  of  interactive 
tools,  and  of  a  translator  [8],  which  generates 
a  Pascal/R  program  from  the  conceptual  model. 

This  environment  supports  the  activities  performed 
at  the  conceptual  design  stage  as  well  as  simpli¬ 
fies  the  maintenance  efforts  by  facilitating 
changes  and  enhancements  to  the  model.  Providing 
a  systematic  approach  to  information  systems 
development  will  not  only  make  the  designer's 
task  easier,  but  it  will  improve  the  quality  and 
reliability  of  the  software  system  [10]. 

The  design  environment  consists  of  an 
integrated  set  of  interactive  tools  which  support 
the  spirit  of  the  concept  specialization  method¬ 
ology.  This  includes  a  class  oriented  editor, 
a  semantic  consistency  verifier,  and  an  inter¬ 
preter  and  debugger  used  for  prototyping.  The 
editor  provides  the  designer  with  facilities  to 
construct,  inspect,  and  modify  a  TAXIS  program. 

The  consistency  verifier  performs  various  checks 
to  insure  the  correctness  of  the  conceptual  model 
being  specified.  The  interpreter  simulates 
execution  of  TAXIS  programs  and  the  debugger 
assists  the  designer  during  the  testing  stage 
by  providing  several  debugging  aids.  The  design 
environment  also  provides  various  system  utilities 
that  support  the  interactive  phase:  including  a 
help  facility,  documentation  generator,  and  program 
maintenance  system. 


natural  and  expedient  to  organize  their  view  of  the 
world  in  terms  of  a  small  number  of  appropriate 
abstraction  mechanisms.  Moreover  we  contend  that 
a  framework  based  on  these  mechanisms  facilitates 
a  systematic  and  productive  progression  from 
initial  world-oriented  system  design  and  implemen¬ 
tation  specifications. 

The  TAXIS  language  [12]  [1]  is  used  for 
describing  the  conceptual  design  specification 
while  RML  [5] ,  which  has  been  designed  to  be  comp¬ 
atible  with  TAXIS,  is  used  for  documenting  the 
requirements  specification.  A  requirements 
specification  states  explicitly  what  the  require¬ 
ments  are  while  the  design  specification  which 
follows  describes  how  to  achieve  the  requirements. 
The  requirements  specification  documents  the  task 
of  gathering  all  of  the  relevant  information  to 
be  used  in  understanding  a  real  world  situation 
prior  to  system  development.  The  output  of  the 
requirements  definition  phase  is  used  as  input  to 
the  conceptual  modeling  phase.  In  this  paper  we 
will  concentrate  on  the  conceptual  design  stage 
and  the  tools  designed  to  facilitate  the  activities 
involved  during  this  phase  of  the  information 
systems  development  life  cycle. 

At  the  conceptual  level  the  designer  describes 
both  the  data  and  the  processes  that  constitute 
the  model  being  defined.  The  designer  uses  an 
interactive  design  and  prototyping  facility  to 
specify  the  conceptual  model  and  then  the  TAXIS 
system  automatically  generates  the  database  and 
the  application  programs  for  the  production 
system  (figure  1).  We  are  interested  in 
describing  models  which  are  closer  to  the  human 
conceptual  view  as  opposed  to  the  machine  level , 
since  the  description  at  the  conceptual  level 
is  more  natural  and  helpful  for  the  application 
designer  in  his  effort  to  organize  and  integrate 
details  into  a  coherent  whole. 


In  [2] ,  we  presented  an  overview  of  the 
TAXIS  project.  We  briefly  outlined  some  aspects 
of  the  design  environment  for  the  conceptual 
modeling  stage.  In  this  paper  we  take  a  closer 
look  at  the  design  environment  and  report  on  our 
recent  efforts  in  this  area.  In  the  next  section 
we  review  some  of  the  principles  behind  the  TAXIS 
project  and  present  a  brief  overview  of  the 
conceptual  design  language.  In  section  3  we 
discuss  TAXIED  [9]  the  design  environment  for  the 
conceptual  modeling  phase.  Section  4  presents  a 
sample  TAXIED  session  illustrating  some  of  the 
activities  performed  at  the  conceptual  design 
stage.  We  conclude  with  a  surmarv  of  our  work  in 
section  5. 

2.  Overview  of  the  TAXIS  System 

We  are  interested  in  identifying  a  small 
number  of  fundamental  principles  that  can  success¬ 
fully  provide  a  basis  for  both  requirements  and 
design  specification  languages  [6].  The  TAXIS 
project  has  been  strongly  guided  by  the  premise 
that  there  exists  a  set  of  abstraction  mechanisms 
that  can  be  used  to  structure  all  parts  of  a 
specification.  We  believe  that  people  find  it 
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2.1  Principles 

The  TAXIS  design  methodology,  which  is  based  on 
the  generalization  abstraction,  is  called 
"concept  specialization"  and  it  offers  a  mech¬ 
anism  for  organizing  details  of  an  information 
system.  Although  the  development  of  information 
systems  by  application  programmers  is  usually 
considered  to  be  mundane  and  boring,  it  is  still 
a  somewhat  error  prone  process.  Unlike  system 
software  or  scientific  software,  the  algorithms 
involved  tend  to  be  quite  straight forward;  yet 
there  is  a  multitude  of  simple  detail  which  has 
to  be  captured,  with  consistency  and  complete¬ 
ness  at  a  premium.  Structured  techniques  which 
address  the  problems  of  algorithmic  complexity 
do  not  help  the  designer  in  managing  this 
complexity  of  detail.  Indeed,  we  believe  that 
most  information  systems  involve  large  amounts 
of  simple  detail  and  that  taxonomies  offer  a 
fundamental  tool  for  coping  with  such  situations. 

The  seminal  idea  of  the  TAXIS  design 
methodology  is  that  a  model  can  be  constructed 
by  modeling  first,  in  terms  of  classes,  the 
most  general  concepts  and  tasks  in  the  applic¬ 
ation  area,  and  then  proceeding  to  deal  with 
special  cases  through  more  specialized  subclasses. 
For  example,  in  building  a  student  enrollment 
system  for  a  university,  one  might  consider 
first  the  concepts  of  student  and  course,  and 
the  tasks  of  enrolling  a  student  for  a  course. 
Later,  the  designer  can  differentiate  between 
graduate  and  undergraduate  students  and  courses , 
day  and  evening  classes,  full-time  and  part-time 
students,  etc.,  and  the  regulations  concerning 
them.  Concept  specialization  enables  the 
designer  to  take  a  bird's  eye  view  of  the 
application  and  then  successively  swoop  down 
lower  and  lower  until  enough  detail  is  captured 
to  form  an  accurate  model. 

The  TAXIS  language  also  uses  the  class¬ 
ification  abstraction  mechanism  which  allows  the 
designer  to  consider  only  the  characteristics 
shared  by  the  instances  of  a  class  and  to  ignore 
individual  differences.  A  new  entity,  called  a 
class,  captures  the  common  characteristics  shared 
by  a  group  of  entities.  This  group  of  objects  is 
called  the  members  or  instances  of  the  class. 

The  definition  of  a  class  includes  a  list  of 
properties  common  to  all  of  the  members  but  omits 
details  which  are  particular  to  each  instance. 

For  example,  the  class  STUDENT,  representing  the 
class  of  all  students  in  a  university,  would  have 
properties  such  as  "every  student  has  an  age," 
but  the  particular  age  of  each  student  would  not 
be  included  in  this  abstraction.  Classification 
can  be  applied  to  classes  as  well  as  objects. 

Thus  a  metaclass  is  a  collection  of  classes  with 
similar  qualities.  For  example,  the  metaclass 
STUDENTCLASS  might  have  as  instances  STUDENTS, 
UNDERGRADUATES,  GRADUATES,  and  so  on. 

A  third  abstraction  mechanism,  called 
aggregation,  allows  an  entity  to  be  viewed  as  a 
collection  of  components  or  parts.  This  abstrac¬ 
tion  dimension  is  orthogonal  to  classification. 


It  lets  the  designer  consider  an  entity  while 
ignoring  or  temporarily  deferring  the  details 
of  its  properties.  Thus  the  class  STUDENT  may 
participate  in  a  description  such  as  CLASS_ 
ENROLLMENT  before  specifying  that  a  student 
consists  of  the  properties  name,  address,  age, 
and  student_number . 

2.2  Conceptual  Design  Language 

The  TAXIS  language  distinguishes  several 
different  types  of  classes  which  can  be  used 
when  modeling  the  data  and  events  that  constitute 
the  conceptual  schema.  The  language  offers 
constructs  which  enable  the  designer  to  define 
taxonomies  of  classes  representing  data  structures, 
transactions,  exceptions  and  their  handlers,  as 
well  as  scripts  that  specify  the  patterns  of 
user-system  dialogues.  Since  we  are  more 
concerned  with  the  design  environment,  we 
only  present  a  brief  outline  of  the  language 
facilities.  A  more  detailed  description  is 
given  in  [7] . 

Processes  are  modeled  using  scripts  and 
transactions.  Scripts  provide  a  Petri  net-like 
control  structure  for  modeling  interaction  and 
communication.  Each  script  is  essentially  a 
Petri  net  with  parameters,  local  variables,  states 
and  state  transitions.  In  turn,  each  transition 
consists  of  conditions,  which  must  be  true  for  the 
transition  to  fire,  and  actions,  which  are  to  be 
carried  out  if  the  transition  fires.  In  order 
to  enable  communication,  scripts  can  employ 
operators  for  message  passing  between  a  script 
and  a  terminal,  or  more  generally  between  any 
two  scripts.  Scripts  are  intended  to  model 
real  world  activities  which  have  significant 
duration  and  which  have  elaborate  communication 
and  synchronization  mechanisms.  For  example, 
scripts  can  be  used  to  model  the  university  career 
of  a  student  and  the  activities  of  enrolling  in 
courses,  obtaining  marks,  passing  or  failing 
courses,  meeting  degree  requirements,  etc. 

Transactions,  in  contrast  with  scripts,  are 
intended  for  modeling  relatively  short-term  events 
which  execute  without  user  interaction.  For 
example,  assigning  a  grade  to  a  student  for  a 
particular  course  would  be  modeled  using  a 
transaction.  Transactions  have  parameters,  locals, 
preconditions,  actions,  postconditions,  and 
returns  properties.  Action  properties  are  used 
to  retrieve  and  manipulate  objects  from  the  data 
base.  Their  syntax  resembles  high  level  Pascal - 
like  code  with  control  loops,  structured 
constructs,  and  relational  DML  operators. 
Transactions  can  optionally  return  a  value 
through  the  returns  property. 

The  preconditions,  which  take  the  form  of 
boolean  assertions,  must  be  satisfied  before 
the  actions  are  carried  out.  Similarily  the 
postconditions  must  hold  after  the  actions  are 
performed.  If  an  assertion  fails,  an  instance 
of  an  exception  class  is  raised,  and  control  is 
passed  to  an  exception  handler  class  which  is 
specified  by  the  caller  of  the  transaction  where 
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the  exception  is  raised.  The  exception  handling 
mechanism,  which  is  similar  to  [11],  is  a  useful 
means  of  dealing  with  over-abstraction.  The 
traditional  view  that  "abnormal"  occurrences 
be  dealt  with  "in-line”  appears  to  run  counter 
to  the  natural  flow  of  description:  one  has  to 
constantly  take  detours  from  describing  how- 
students  usually  enroll  in  courses  in  order  to 
say  what  is  to  be  done  in  rare  special  cases. 

To  model  objects  and  relationships  between 
objects,  TAXIS  provides  variable  data  classes 
which  are  analogous  to  Codd’s  relations.  Variable 
data  classes  have  key,  characteristic,  and  attrib¬ 
ute  properties.  Characteristics  are  properties 
that  cannot  be  modified  (keys  are  a  subset  of  the 
characteristic  properties).  For  example,  when 
describing  the  class  STUDENTS  we  might  specify 
that  student_number  is  a  key;  that  student 
number  and  name  are  characteristic  properties; 
and  that  address,  age,  and  current  year  are 
attribute  properties.  The  set  of  Instances  of 
a  variable  class  can  be  manipulated  by  INSERT 
and  DELETE  operators.  Variable  classes  can 
have  boolean  assertions  associated  with  them 
that  are  checked  whenever  an  instance  is  modified 
in  any  way. 

TAXIS  has  other  types  of  classes  which  can 
be  used  for  data  structuring.  These  classes 
provide  some  of  the  conventional  means  for 
structuring  data  that  are  found  in  programming 
languages.  For  example,  TAXIS  has  enumerated 
data  types;  primitive  data  types  such  as  integer, 
boolean,  and  character;  aggregate  class  t>res 
which  are  similar  to  Pascal's  record  types ;  and 
form  defined  classes  which  allow  the  designer 
to  specify  various  formats  for  data  (e.g.  DDD- 
DDD-DDDD  for  a  phone  number).  TAXIS  also  has 
test-defined  classes  which  are  defined  by 
associating  to  the  class  a  boolean-valued  trans¬ 
action  that  returns  true  if  and  only  if  its 
argument  is  an  instance  of  the  class. 

Figure  2  gives  an  example  of  a  variable  class 
and  a  transaction  class.  The  STUDENT  variable 
class  has  a  key  property  student_number,  two  « 
characteristic  properties  student_nunber  and 
name,  ard  several  attribute  properties.  The 
ENROL  transaction  takes  four  parameters :  a 
student,  course,  semester,  and  sectionjiumber. 
Before  enrolling  a  student  in  a  course  we  first 
verify  that  the  class  exists  and  that  there  is 
enough  space  in  the  class  for  the  student.  This 
is  handled  through  the  precondition  properties 
"class_there?"  and  "enough_space?".  Note  that 
both  preconditions  have  specified  exceptions 
that  will  be  raised  if  their  assertions  fail. 

The  action  properties  "insert_student"  and 
"update_class_size"  perform  the  actual  enrolling 
process  of  creating  an  instance  of  the  ENROLLMENT 
class  and  updating  the  number  of  students  enrolled 
in  the  specified  class. 

The  TAXIS  design  methodology  suggests  that 
the  designer  should  start  by  defining  the  most 
general  naturally  occurring  classes  of  objects 
and  events  in  the  domain;  this  is  to  be 


Sample  TAXIS  Classes 

variable  student 
keys 

student :  (student_number) ; 
characteristics 

student_number :  integer ; 
name:  string; 
attributeproperties 

address:  address_value; 
age:  integer; 

year:  -(  'freshman',  'sophmore', 

'junior',  'senior',  'graduate'  )- ; 
faculty:  faculty_value; 
status:  -(  'full-time',  'part-time'  )- ; 
end 

transaction  enrol  with 
parameters 

enrol :  (s ,  c ,  sem,  s4) ; 
locals 
s:  student; 
c:  course; 
sem:  semester; 
s?:  <  1::  15  )-  ; 
cl:  class; 
precondit ions 
class  there?: 

begin 

getobject  cl  from  class  with 
(cl. semester  =  sem  and 
cl. course  =  c  and  cl. section^  =  s?) 

end 

(cl<>  nothing) 

exc  no_such_class  (s,  cl); 
enough_space? : 

begin 

getobject  cl  from  class  with 
(cl. semester  =  sem  and 
cl. course  =  c  and  cl.  section**  =  s-) 

end 

(cl. max*  >  cl.num  of_students) 

exc  class_filled  Is,  cl); 
actions 

insert_student : 

insertobject  cl  in  enrollment  with 
(student:  s,  course:  c, 
semester:  sem,  section*:  s“); 
update_class  size: 

cl ,num_of_students  <-  cl ,num_of_students  +  1; 
end 

Figure  2 

accomplished  by  the  use  of  named  properties  which 
connect  related  concepts  and  through  the  use  of 
assertions  which  restrict  the  potential  relation¬ 
ships.  Further  details  of  the  proposed  system  arc 
then  introduced  in  successive  iterations  by 
describing  subclasses  of  already  presented  classes, 
and  specializing  transactions  and  scripts  in  order 
to  deal  with  the  objects  in  these  classes. 

The  result  is  a  hierarchy  (taxonomy)  of  object, 
transaction,  and  script  classes  on  which  inheritencc 
operates  to  abbreviate  natural  redundancy  without 
losing  the  benefit  of  being  able  to  check 
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consistency.  Once  the  usual/normal  aspects  of  the 
system  are  described  to  some  level  of  detail,  the 
designer  can  describe,  using  the  same  methodology, 
the  exceptions  raised  by  the  failure  of  assertions 
and  the  exception  handling  mechanisms  for  them. 

We  feel  that  the  above  methodology  is  conducive 
to  a  natural  style  of  description  because  it  is 
oriented  toward  conceptual  objects  and  activities 
occurring  in  the  user's  world. 

3.  TAXIS  Design  Environment 

To  support  the  conceptual  modeling  phase 
we  provide  a  comprehensive  and  coherent  set  of 
tools  that  not  only  work  together,  but  also  support 
the  concept  specialization  design  methodology. 

The  process  of  designing  and  implementing  inform¬ 
ation  systems  is  a  complex  task.  Instead  of 
supplying  the  designer  with  one  tool  that  encom¬ 
passes  all  aspects  of  conceptual  modeling,  we 
offer  an  entire  set  of  specialized  tools  that 
can  be  utilized  during  the  process  of  designing 
and  implementing  an  information  system.  The 
best  tools  are  simple  and  address  a  specific 
problem;  thus  they  are  easily  used  and  are 
clearly  appropriate  for  persons  having  such  a 
problem.  A  tool  should  not  be  considered  an 
end  in  itself,  but  rather  something  that  supports 
an  activity  as  a  means  to  an  end.  The  final 
criterion  for  judgment  is  whether  or  not  the 
tool  improves  the  productivity  of  its  users. 

We  maintain  that  the  tools  can  be  of  substantial 
value  only  when  deployed  within  the  context  of 
a  systematic  approach  to  software  development. 

There  are  many  fine  examples  of  general 
purpose  online  systems  for  the  development  of 
applications.  For  example,  the  UNIX  programming 
support  environment  has  received  notable  attention. 
The  TAXIS  design  environment  TAXIED  combines 
tools,  which  are  normally  associated  with  soft¬ 
ware  development  such  as  editors,  interpreters, 
and  debuggers,  together  with  data  base  design 
tools  such  as  consistency  checkers,  documentation 
generators,  and  versioning  controllers.  These 
tools,  along  with  the  TAXIS  language,  are  closely 
integrated  within  an  interactive  system  to  provide 
an  effective  user-design  environment.  TAXIED  is 
specifically  designed  to  support  the  activities 
involved  when  developing  applications  using  the 
TAXIS  design  methodology.  This  coupling  of  tools 
and  language  to  support  the  TAXIS  design  method¬ 
ology  is  essential  if  we  want  to  reduce  the 
complexity  of  designing,  developing,  testing  and 
maintaining  large  information  systems. 

To  discuss  the  components  of  the  TAXIED 
system,  we  have  classified  the  components  into 
categories  according  to  their  use. 

3.1  Schema  Manipulation  Tools 

The  T.AXIED  editor  is  "class-based"  and 
"syntax -oriented."  By  "class-based"  we  mean  that 
the  editor  supports  and  encourages  the  TAXIS  class 
concept  which  utilizes  the  abstraction  techniques 
of  classification,  generalization  and  aggregation. 
The  designer  incrementally  describes  the  TAXIS 


program,  using  the  specialization  technique,  on  a 
class  by  class  basis.  The  editor  allows  the 
designer  to  focus  attention  on  a  particular  class 
when  describing  and  modifying  properties  for  that 
class.  Tne  editor  is  "syntax -oriented"  in  that 
it  is  sensitive  to  the  syntax  of  the  TAXIS 
language,  helping  the  user  with  formatting, 
structuring,  and  syntax  checkout.  All  properties 
and  class  descriptions  are  syntactically  correct 
after  editing  is  completed. 

One  of  the  most  important  features  of  the 
editor  is  its  ability  to  support  the  concept 
specialization  methodology  of  TAXIS.  When 
defining  a  new  class,  which  is  a  specialization 
of  an  existing  class,  the  inherited  properties  of 
predecessor  classes  are  displayed.  The  designer 
does  not  have  to  copy  identical  parts  of  descrip¬ 
tions;  he  only  specifies  new  properties  or 
specializes  existing  ones.  This  tool  removes  from 
the  designer  the  error-prone  task  of  copying 
identical  parts  of  descriptions  without  imposing 
the  burden  of  remembering  what  properties  would 
be  inherited.  The  editor  also  provides  facilities 
for  adding,  deleting,  or  changing  existing 
properties  that  are  defined  to  a  particular  class. 

The  graphics  and  formatter  component  provides 
the  capability  to  display  portions  of  the  TAXIS 
program  while  emphasizing  the  three  main  abstrac¬ 
tion  principles  supported  by  the  TAXIS  design 
methodology.  The  designer  can  request  to  see 
formatted  descriptions  of  TAXIS  classes  including 
or  excluding  inherited  property  definitions 
(aggregation  principle).  The  designer  can  also 
request  to  see  graphical  representations  of 
successor  or  predecessor  isa  hierarchies 
(generalization  principle).  Finally,  the  designer 
can  request  to  see  instances  of  classes  at  the 
simple,  met a  or  metameta  class  level  (classific¬ 
ation  principle). 

3.2  Static  Semantic  Analysis 

The  static  semantic  analysis  tools  aid  in  the 
consistency  checking  of  the  conceptual  schema. 
Three  separate,  but  interrelated,  tools  were 
designed  to  help  the  designer  in  verifying  the 
semantic  integrity  of  a  TAXIS  program.  They  con¬ 
sist  of  a  consistency  checker,  an  assertion 
checker,  and  an  external  assertion  checker. 

The  responsibility  of  the  consistency 
checker  is  to  verify  that  the  syntactic  require¬ 
ments  of  the  T.AXIS  language  are  being  upheld. 
Numerous  consistency  checks  are  performed 
including  type  checking,  specialization,  and 
argument  passing  for  parameters  and  exceptions. 

The  consistency  checker  is  similar  to  the  type 
checkers  proposed  for  the  Ada  language  compilers. 
The  syntax  checker  encourages  and  supports  the 
TAXIS  design  methodology  by  operating  on  a  class 
by  class  basis.  The  TAXIED  system  forces  the 
designer  to  verify  the  consistency  of  the 
conceptual  schema  before  performing  dynamic 
analysis . 

An  important  consideration  when  performing 
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static  semantic  analysis  on  the  conceptual  schema 
is  to  verify  that  specialized  transaction  asser¬ 
tions  imply  their  more  general  forms.  For  example, 
consider  a  transaction  REGISTFR  that  simulates  the 
process  of  registering  freshman  students  in  the 
university.  We  might  have  a  constraint  that  a 
student's  highschool  gradepoint  average  must  be 
greater  than  3.0.  This  can  be  expressed  by  the 
precondition: 

check  gradepoint  average?: 

"(student. highschool  gradepoint  >3.0) 
exc  lowjgradepoint  (student) ; 

Various  colleges  within  the  university  have  their 
own  requirements  for  gradepoint  averages,  some 
being  a  little  stricter.  When  specializing  the 
general  REGISTER  transaction  to  REGISTER^AT 
PRESTIGIOUS_COLLEGE,  the  gradepoint  constraint 
would  have  to  be  a  proper  specialization.  For 
example : 

check  gradepoint  average?: 

"(student. highschool  gradepoint  >3.S) 
exc  low_gradepoint  (student); 

Related  to  this  is  the  problem  of  detecting  that 
one  assertion  contradicts  another,  which  would 
result  in  the  class  having  no  instances.  We 
would  like  the  assertion  checker  to  detect 
syntactically  when  such  contradictions  arise, 
and,  where  this  is  not  possible,  it  should  display 
the  assertions  and  their  relationships  for 
confirmation  of  their  correctness  by  the  designer. 
For  this  version  of  TAXIED  we  simply  display  the 
assertions  and  do  not  attempt  any  syntactic 
checks. 

The  last  aspect  of  consistency  checking  deals 
with  the  semantics  of  the  application  being 
designed  as  opposed  to  the  semantics  of  the  TAXIS 
language.  The  syntactic  analyzer  and  assertion 
checker  attempt  to  ensure  that  the  description  of 
the  TAXIS  program  is  consistent  and  complete,  while 
the  external  assertion  checker  is  used  to  verify 
that  various  external  assertions  hold  true.  The 
designer  formulates  several  statements  concerning 
what  he  believes  to  be  true  about  the  model  he  has 
constructed  and  the  external  assertion  checker 
attempts  to  ensure  that  these  assertions  are  valid. 
The  external  assertion  checker  was  not  implemented 
for  this  version  of  TAXIED. 

3.3  Dynamic  Analysis 

The  dynamic  analysis  tools,  the  interpreter 
and  the  debupoer,  allow  the  designer  to  inspect 
the  behaviour  of  the  conceptual  schema  on  user 
input.  Moreover,  they  provide  the  designer  with 
a  prototyping  facility  to  test  out  the  design  of 
the  application  system.  Since  a  working  system  is 
a  more  tangible  concept  as  compared  to  a  written 
specification,  prototyping  has  proven  to  be  a 
useful  method  of  verifying  system  specifications 
with  end  users.  Modeling  transactions  is  useful 
since  it  points  out  inconsistencies  in  integrity 
assertions  and  eliminates  deficiencies  in  the  con¬ 
ceptual  schema.  Furthermore,  productivity  and 


morale  benefits  are  enhanced  when  immediate  feed¬ 
back  is  generated  on  errors  rather  than  waiting 
for  separate  submissions. 

The  interpreter  simply  accepts  and  evaluates 
T.AXIS  statements.  Thus  the  designer  can,  among 
other  things,  invoke  transactions,  display 
instances  of  classes  or  inspect  the  conceptual 
schema  through  definitional  property  selection. 

The  interpreter  will  help  in  the  semantic  verif¬ 
ication  process  by  identifying  errors  or  incon¬ 
sistencies  in  the  conceptual  schema  that  are  not 
detectable  by  static  semantic  analysis  alone.  We 
also  feel  that  the  quality  of  the  conceptual 
schema  specification  will  be  enhanced  by  observing 
its  behaviour  during  simulation  by  the  interpreter. 

The  interactive  debugger's  primary  purpose  is 
to  aid  in  the  process  of  discovering  and  correcting 
design  errors  during  interpretation.  The  debugger 
is  specifically  oriented  towards  the  facilities  and 
language;  this  helps  enhance  the  development  and 
testing  process.  The  user  can  symbolically  request 
the  dynamic  formatted  display  of  values  at  user 
designated  breakpoints  or  upon  entry  to  or  exit 
from  a  transaction.  The  debugger  is  also  capable 
of  stepping  through  action  statements  and  tracing 
transactions  calls  and  exception  handlers. 

3.4  System  Utilities 

The  design  environment  provides  several  other 
utilities  which  help  the  designer  during  the 
specification  phase  and  when  using  the  TAXIED 
system  itself.  These  utilities  include  schema 
control  functions,  a  help  facility,  documentation 
generators,  and  an  interface  to  the  T.AXIS  compiler. 

The  TAXIED  system  maintains  a  directory  of 
all  TAXIS  programs  which  have  been  created  and 
saved  by  users  of  the  system.  The  user  can 
retrieve  and  save  stored  programs  as  well  as 
display  the  names  and  versions  of  all  programs 
defined  to  the  system.  The  TAXIED  system  allows 
the  designer  to  maintain  several  versions  of  the 
same  program  through  its  versioning  capabilities. 

TAXIED  has  a  help  facility  which  provides  the 
designer  with  online  information  describing  both 
the  TAXIS  language  and  the  design  environment. 
Comprehensive  and  informative  data  is  stored  on  all' 
TAXIED  commands  as  well  as  various  portions  of  the 
TAXIS  language.  In  an  interactive  system  such  as 
the  TAXIED  design  environment  we  feel  that  it  is 
essential  to  have  this  type  of  documentation 
readily  available  and  accessible. 

Another  important  form  of  documentation  con¬ 
cerns  the  application  system  being  developed.  The 
TAXIED  system  maintains  cross  references  and 
undefined  references  for  all  types  of  classes. 

The  designer  can  use  TAXIED  commands  to  display 
this  information;  or  he  can  write  tailored  TAXIS 
transactions  to  view  portions  of  the  data,  since 
the  references  are  stored  in  TAXIS  data  classes. 

The  designer  can  also  generate  a  hard  copy  report 
giving  a  complete  description  of  a  specific  TAXIS 
program.  This  report  consists  of  isa  hierarchies. 
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class  descriptions,  and  a  cross  reference  listing. 

The  TAXIED  system  also  provides  an  interface 
to  the  TAXIS  compiler  [8].  The  designer  can 
request  that  the  description  of  the  conceptual 
schema  is  to  be  sent  to  the  compiler  and  that  a 
compiled  version  of  the  program  is  to  be  generated. 
The  compiler  generates  a  Pascal/R  data  base  along 
with  Pascal  programs  which  form  the  basis  of  the 
production  system. 

4.  TAXIED  Session 

In  this  section  we  present  a  brief  example  of 
a  TAXIED  session  to  illustrate  a  few  of  the 
system's  capabilities.  The  TAXIED  system  is 
implemented  on  a  VAX  780  under  the  UNIX  operating 
system.  The  user's  input  is  always  preceded  by 
a  system  prompt  like  ”TE:"  or  "INPUT:."  The 
first  step  is  to  start  the  TAXIED  system  from 
the  UNIX  shell. 

$  taxied 

Franz  Lisp  Opus  36 
-->  (load  'taxied. 1) 

PLEASE  STANDBYE  WHILE  THE  SYSTEM  INITIALIZES... 


CTAXIED  SYSTEM  1.0> 

The  first  set  of  examples  will  show  how  we 
can  define  TAXIS  classes,  edit  class  descriptions, 
and  use  the  system  to  tell  us  what  has  not  been 
specified.  We  will  define  a  variable  class 
"student"  which  will  hold  student  entities.  The 
"student"  class  is  an  instance  of  the  metaclass 
"student_class,"  which  we  define  first.  Each  type 
of  TAXIS  class,  for  example  variable  or  transaction 
classes,  has  different  categories  of  properties. 

In  the  example  below  we  see  that  a  variable  class 
has  key,  characteristic,  and  attribute  properties. 
The  system  displays  the  appropriate  property 
headings  and  prompts  the  user  to  enter  property 
values. 

TE:  define  metaclass  student_class  isa  variable 

TE:  define  studentjzlass  student  with 

keys 

INPUT :  student :  (student_number) ; 
characteristics 

INPUT:  student_number :  integer; 

INPUT:  name:  string; 

INPUT:  <cr> 
attributeproperties 
INPUT:  address:  string; 

INPUT:  year:  year  value; 

INPUT:  <cr> 
end 

Properties  can  be  viewed  as  triples  consisting  of 
one  or  more  subjects,  an  attribute  (or  name),  and 
a  property  value.  For  example  the  key  property 
above  has  the  class  student  as  its  subject;  the 
name  of  the  property  is  student;  and  its  property 
value  is  student_number. 

The  TAXIED  system  keeps  track  of  classes  that 


have  been  specified,  but  not  yet  defined,  in  a 
special  TAXIS  variable  class  called  undefined 
classes.  Below  we  show  how  the  designer  can  use 
a  TAXIED  command  that  displays  the  instances  of  a 
class  to  get  a  list  of  all  undefined  classes. 

TE:  display-ins  undefined_classes 

class  name  :  year  value 


We  continue  with  our  definition  of  classes  by 
describing  some  finitely_defined  classes  and 
assigning  their  extensions. 

TE:  define  finitely_defined  year_value 
TE:  assign  year_value  :=  -{  'freshman',  'sophmore', 
'junior',  'senior',  'graduate'  V; 

TE:  define  finitely_defined  graduate_year_value 
isa  vear_value 

TE:  assign  graduate_year_value  :=  -(  'graduate'  )-; 

Next  we  define  a  specialization  of  the  class 
student  called  graduate.  The  system  displays  the 
previously  defined  properties  of  student  and  we 
are  prompted  to  add  new  properties  for  graduate  or 
specialize  existing  ones. 

TE:  define  student_class  graduate  isa  student  with 
keys 

student_number:  integer; 

INPUT  :<cr> 
characteristics 

student_number :  integer; 
name:  string; 

INPUT : <  cr  > 
attributeproperties 
address:  string; 
year:  year_value; 

INPUT:  advisor:  f acul ty_member ; 

INPUT:  vear:  graduate_year_value; 

INPUT :  <cr> 
end 

Lastly,  we  will  define  a  transaction  class 
which  will  be  used  to  add  occurrences  of  students 
to  the  student  variable  class.  When  the  trans¬ 
action  is  invoked  it  will  prompt  the  user  for 
various  attributes  of  a  student  and  then  create 
an  instance  in  the  student  class. 

TE:  define  transaction  add_student  with 

parameters 

INPUT :  <cr> 

locals 

INPUT:  stud_no:  integer; 

INPUT:  name:  string; 

INPUT:  address:  string; 

INPUT:  year:  string; 

INPUT:  st:  student; 

INPUT:  <cr> 
preconditions 
INPUT :  <cr> 
actions 

INPUT:  get_info:  begin 

'enter  student  no ’.print;  stud_no  <-  Tead; 
'enter  name'. print;  name  <-  read; 

'enter  address' .print;  address  <-  read; 

'enter  current  academic  year '.print; 
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year<-  read 
end; 

INPUT:  add  info: 

insertobject  st  into  student  with 

(student  number:  stud  jio , 

name,  address:  address,  year:  year); 

INPUT :  <cr> 
postconditions 
INPUT:  <cr> 
returns 
INPUT:  <cr> 
end 

Below  we  give  an  example  of  how  we  can  edit  a 
class  description.  In  this  case  we  change  the 
property  value  of  the  local  variable  year,  in 
add_student,  from  string  to  year_value. 

TE:  edit  add_student 

EDIT:  change  local  year:  year_value; 

EDIT:  <cr> 

EDITING  COMPLETED 

The  following  set  of  examples  illustrate  how 
the  consistency  checker  can  be  used.  The  full- 
check  command  takes  a  metaclass  as  a  parameter 
and  performs  consistency  checking  on  the  metaclass, 
its  isa  successors,  and  all  of  its  instances.  The 
check  command  operates  on  a  class  and  all  isa 
successors  of  that  class. 

TE:  check  year_value 

checking  class:  year_value 
checking  class:  graduate_year_value 

TE:  full -check  variable 

checking  class:  student  jrlass 

checking  class:  student 

checking  class:  graduate 

*  *  *CONS I STENCY  ERROR***  IN  CLASS:  graduate 

AND  PROPERTY:  advisor  -- 

THE  CLASS  faculty-member  IS  NOT  DEFINED 

TE:  define  variable  facultyjnember 
TE:  define  for  facultyjnember 

characteristic  name:  string; 

TE:  check  graduate 

checking  class:  graduate 

TE:  check  add_student 

checking  class:  add_student 

We  ran  into  a  consistency  error  because  the  class 
facultyjnember  used  in  graduate  was  not  defined. 

We  corrected  the  error  and  proceeded  to  consist¬ 
ency  check  the  remaining  classes.  Now  we  can 
demonstrate  use  of  the  interpreter  by  invoking  the 
add_student  transaction. 

TE:  interpret  add_student; 

enter  student  no 

4S901 

enter  name 
"Steve  Jones" 


enter  address 
"255  GlenLake,  Toronto" 
enter  current  academic  year 
"freshman" 

TE:  display-ins  student 

studentjio  :  45901 

name  :  Steve  Jones 

address  :  255  GlenLake,  Toronto 

year  :  freshman 


The  display-ins  command  was  used  to  verify  that 
the  instance  of  student  was  in  fact  created. 
Finally,  below  we  show  how  another  TAXIED  corr-and 
display-suc  is  used  to  display  the  isa  successor 
hierarchy  beneath  a  specified  class. 

TE:  display-suc  student 
student 

graduate 

5.  Summary 

The  TAXIS  project  attempts  to  provide  a 
systematic  approach  to  the  design  and  development 
of  information  systems.  We  are  applying  the  same 
general  principles,  concept  specialization  and 
other  abstraction  mechanisms,  to  the  requirement 
specification  and  conceptual  modeling  phases  cf 
the  information  system  life  cycle.  The  TAXIS 
system  enhances  the  traditional  life  cycle  process 
by  simplifying  the  logical  and  physical  database 
design  stages,  and  by  improving  the  interface 
between  design  stages.  Furthermore ,  TAXIS 
provides  a  comprehensive  set  of  tools  to  assist 
the  designers  and  developers;  since  we  contend 
that  successful  utilization  of  a  systematic 
approach  requires  tools  that  support  and 
facilitate  the  general  principles  of  the  design 
methodology. 

We  have  taken  a  brief  look  at  the  TAXIED  sy-nn 
which  is  intended  to  help  designers  model  inform¬ 
ation  systems  at  the  conceptual  level.  The 
TAXIED  system  was  designed  specifically  to 
encourage  and  support  the  use  of  the  TAXIS  dcs ;gn 
methodology.  The  system  is  knowledgeable  of  the 
syntax  and  semantics  of  the  conceptual  design 
language.  It  supports  and  facilitates  the  u-c  cf 
concept  specialization,  exception  handling,  and 
the  other  abstraction  mechanisms  employed  by 
TAXIS.  The  set  of  tools  that  TAXIED  provides  is 
both  integrated  and  interactive.  We  view  tin.- 
TAXIED  system  as  a  comprehensive  set  of  special¬ 
ized  tools,  each  tool  designed  to  perform  a 
specific  task.  The  integration  of  these  tools 
provides  a  productive  enviroament  for  the 
designer. 

The  TAXIS  language  has  been  used  quite 
successfully  in  the  design  of  two  prototypical 
medical  information  systems  for  a  large  hospital 
in  Toronto  [3]  [4] .  We  feel  that  TAXIED,  the 
integrated  interactive  design  environment  that 
we  have  briefly  described  here,  will  not  only 


-  100  - 


improve  the  productivity  of  the  designers  of 
information  systems  using  TAXIS,  but  also  enhance 
the  quality  of  the  systems  being  developed. 
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"The  real  problem  is  the  mass  of  detailed 
requirements;  and  the  only  solution  is 
the  discovery  or  invention  of  general 
rules  and  abstractions  which  cover  the 
many  thousands  of  cases  with  as  few 
exceptions  as  possible." 

-  C.  A.  R.  Hoare 

Abstract 

The  view  is  adopted  that  software  requirements 
involve  the  representation  (modeling)  of 
considerable  real-world  knowledge,  not  just 
functional  specifications.  A  framework  (RMF)  for 
requirements  models  is  presented  and  its  main 
features  are  illustrated.  RMF  allows  information 
about  three  types  of  conceptual  entities  (objects, 
activities,  and  assertions)  to  be  recorded 
uniformly  using  the  notion  of  properties.  By 
grouping  all  entities  into  classes  or  metaclasses, 
and  by  organizing  classes  into  generalization 
(specialization)  hierarchies,  RMF  supports  three 
abstraction  principles  (classification, 

aggregation,  and  generalization)  which  appear  to  be 
of  universal  importance  in  the  development  and 
organization  of  complex  descriptions.  Finally,  by 
providing  a  mathematical  model  underlying  our 
terminology,  we  achieve  both  unambiguity  and  the 
potential  to  verify  consistency  of  the  model. 


U  INTRODUCTION 

Requirements  definition  is  the  task  of  gathering 
all  of  the  relevant  information  to  be  used  in 
understanding  a  problem  situation  prior  to  system 
development.  The  docunentation  of  this  information 
is  called  the  requirements  specification.  The  form 
and  content  of  the  requirements  specification  can 
have  a  tremendous  impact  on  the  task  of  software 
throughout  its  lifetime. 

Experience  over  the  last  decade  has  led  to  some 
important  observations  that  point  tothe  need  for 
improved  requirements  specification  languages. 
First,  it  appears  that  more  attention  to 
requirements,  including  a  better  understanding  of 


the  problem  situation,  pays  off  in  reduced  total 
life-cycle  effort  and  cost  (3].  Secondly,  it  has 
been  learned  that  it  is  difficult,  indeed,  to  "get 
the  requirements  right";  some  common  problems  are 
ill-defined  terms,  inconsistencies,  ambiguities, 
and  the  tendency  to  mix  requirements  with  design 
decisions  [ 1 ]. 

r 

Much  of  requirements  definition  involves  such 
tasks  as:  defining  terms  in  the  domain  of 

discourse,  stating,  clarifying  and  agreeing  on 
assumptions  and  constraints,  and  discussing  and 
negotiating  the  needs  and  objectives  of  an 
organization  (business,  government,  industry). 
Whatever  the  application  "world"  (e.g.  airline 
reservations,  manufacturing,  hospital 

administration,  etc.)  there  is  a  body  of 

"knowledge"  used  to  interpret  and  understand  that 
world . 

For  example,  in  considering  the  development  of  a 
variety  of  information  systems  for  a  large  hospital 
in  Toronto,  we  have  found  it  necessary  to  become 
intimately  familiar  with  a  wide  range  of  subject 
matters:  medical  knowledge,  hospital  procedures  and 
policies,  available  therapies  (drugs,  surgery, 

etc.),  legal  responsibilities  to  government,  and  so 
on.  We  believe  that  this  kind  of  real  world 
knowledge  needs  to  be  captured  in  a  formal 
requirements  specification.  The  ability  to 
efficiently  design,  appropriate  computer  systems  and 
enable  them  to  evolve  over  their  lifetime  depends 
on  the  extent  to  which  this  knowledge  can  be 
captured  . 

Most  current  requirements  languages  concentrate 
primarily  on  f unctional  specifications ,  which  give 
a  high-level  target  system  description  in  terms  of 
the  functions  to  be  performed  by  the  ultimate 
system  (with  an  emphasis  on  what  the  system  is 
supposed  to  do  but  not  how) .  In  functional 
specification  approaches,  the  world  knowledge . 
whose  importance  we  have  been  discussing,  is  often 
not  an  explicit  part  of  the  requirements 
specification.  The  knowledge  is,  at  least 
initially,  scattered  throughout  documents  and  the 
minds  of  people  across  the  organization. 
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Our  research  addresses  the  development  of 

languages  and  tools  for  requirements  modeling,  a 
specification  approach  directed  toward  high-level 
specifications  which  capture  world  knowledge 

directly  and  naturally  in  the  specification.  In 
this  paper  we  present  a  framework  for  requirements 
modeling  called  RMF.  An  RMF  model  describes  some 
"slice  of  reality".  RMF  guides  the  information 
gathering  job  of  requirements  definition.  The 
resultant  description  should  be  useful  in 

determining  what  the  problem  situation  is  and  what 
solutions  are  possible.  The  design  of  RMF 
emphasizes  the  need  for  structuring  the  description 
to  ease  the  task  of  finding  answers  to  questions 
during  system  design  and  implementation  and  in  the 
face  of  changing  requirements. 

Section  2  of  the  paper  introduces  our  modeling 
approach  and  discusses  the  central  theme, 
abstraction  mechanisms.  Section  3  is  the  core  of 
the  paper,  presenting  the  main  features  of  RMF 
along  with  illustrative  examples.  Section  4 
discusses  related  work  and  fills  in  some  background 
material.  Finally,  Section  5  discusses  some 
interesting  aspects  of  the  framework  and  of  our 
ongoing  research. 

2^  MODELING  AND  ABSTRACTION 
2 . 1  A  modeling  approach 

The  major  goal  of  RMF  is  to  form  a  synthesis  of 
some  modeling  principles  which  we  believe  are 
essential  to  the  requirements  modeling  task;  but  we 
stop  short  of  providing  a  specific  language 
incorporating  them.  We  apply  these  principles  to 
answer  two  key  questions  about  requirements 
modeling:  (1)  What  kind  of  information  should  be 
captured  in  a  requirements  model?  (2)  How  should  a 
requirements  model  be  structured? 

We  have  chosen  a  representation  method  in  which 
a  requirements  model  consists  of  a  collection  of 
conceptual  entities  (or  simply  entities)  defined  by 
their  inter-relationships  with  other  entities. 
Thus,  the  first  question  above  is  answered  by  (i) 
choosing  appropriate  specification  units,  or  entity 
categories ,  and  (ii)  deciding  what  kinds  of 
relationships  are  allowed  within  and  between 
entities  of  each  entity  category;  later  on  in  the 
presentation  we  shall  call  the  relationships 
properties  and  the  types  of  relationships  property 
categories. 

RMF  offers  three  kinds  of  specification  units: 
object ,  activity,  and  assertion .  A3  discussed  in 
Section  4,  these  entity  categories  have  been  used 
successfully,  in  one  form  or  another,  across  a  wide 
range  of  modeling  endeavors. 

For  a  long  time  it  has  been  asserted  that 
abstraction  is  the  best  tool  we  have  toward  the 
intellectual  manageability  of  complex  descriptions 
(11,14].  The  framework  is  based  on  the  premise  that 
effective  structuring  of  large  descriptions  such  as 
requirements  models  depends  on  the  use  of  good 
abstractions.  Below  we  introduce  the  abstraction 
mechanisms  U3ed  in  the  framework. 


2 .2  Abstraction  mechanisms 

An  abstraction  mechanism  is  any  descriptive 
facility  that  allows  certain  kinds  of  information 
to  be  included  while  precluding  other, 
"lower-level"  or  "less  important"  details.  In 
Software  Engineering,  abstraction  is  usually 
equated  with  the  suppression  of  design  decisions  or 
implementation  details.  In  this  sense  a 
requirements  model  should  be  "very  abstract"; 
indeed,  it  is  proposed  as  the  "most  abstract" 

specification  for  use  in  Software  Engineering. 

In  RMF  we  propose  the  use  of  a  set  of 

complementary  abstraction  mechanisms  for 
descriptive  purposes.  A  first  abstraction 
mechanism,  aggregation .  allows  an  entity  to  be 
viewed  as  a  collection  of  components,  or  parts.  A 
second  abstraction  mechanism  is  classi fication , 
which  allows  a  new  entity,  the  class,  to  capture 
common  characteristics  shared  by  a  group  of 

entities.  A  third  abstraction,  generalization , 
captures  the  common  characteristics  of  several 
classes . 

Classification  allows  one  to  consider  only  the 
characteristics  shared  by  the  instances  of  a  class 
and  to  ignore  individual  differences.  Aggregation 
allows  one  to  consider  an  entity  while  ignoring 
further  detail  about  the  components.  Generalization 
allows  one  to  consider  only  those  properties  that  a 
collection  of  classes  have  in  common  without 

considering  the  classes'  individual  differences. 

A  principal  design  goal  of  our  RMF  is  to  apply 
the  abstraction  mechanisms  uniformly  over  all  the 
entity  categories.  That  is  to  say,  there  are,  in  an 
RMF  model,  classes  for  object,  activity,  and 
assertion  entities;  entities  of  all  categories  can 
have  component  parts;  and  classes  of  each  entity 
category  are  organized  according  to  their 
generality/ specificity. 

We  feel  that  a  specification  language  exhibiting 
such  uniformity  will  be  much  easier  to  formalize, 
understand  and  use. 

3^  THE  FEATURES  OF  RMF 

We  first  show  in  terms  of  objects  how  the 
abstraction  mechanisms  work,  and  then  we  extend  the 
presentation  to  activities  and  assertions  as  well. 

3 . 1  Tokens ,  classes ,  and  metaclasses 

Entities  are  stratified  into  classification 
levels  according  to  whether  they  are  considered 
"individuals",  called  tokens,  collections  of  tokens 
called  classes ,  classes  of  classes  called 
metaclasses ,  and  so  on.  The  tokens  of  a  class  are 
called  its  instances ;  similarly,  a  class  is  said  to 
be  an  instance  of  a  metaclass. 

Simple  examples  of  object  tokens  are  john-smith 
(representing  a  particular  person)  and  7 
(representing  the  number  7).  PERSON  is  an  example 
of  a  class,  whose  instances  are  tokens  such  as 
john-smith,  while  INTEGER  is  a  class  whose 
instances  would  include  7. 
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An  example  of  a  metaclass  is  PERSON_CLASS,  whose 
instances  are  classes  of  persons,  such  as  PERSON, 
PATIENT,  PHYSICIAN,  NURSE. 

In  addition  to  its  instances,  a  class  bears 
additional  information,  but  for  this  we  need  the 
notion  of  properties. 

For  the  remainder  of  the  paper  we  will  use  lower 
case  letters  and  digits  for  token  identifiers  and 
upper  case  letters  for  class  and  metaclass 
identifiers.  The  suffix  "_CLASS"  will  be  use  for 
metaclass  identifiers. 

3.2  Properties  —  factual  and  definitional 

Entities  can  be  related  to  other  entities  by 
participating  in  properties.  Properties  consist  of 
three  items  of  information:  a  subject ,  an  attribute 
(or  property  name),  and  a  value.  To  express  the 
property  of  the  token  john-sroith  that  his  age  is 
23,  we  could  write 

<john-smith,  age,  23> 

where  the  subject,  attribute,  and  value  are 

john-smith,  age  (an  identifier),  and  23. 
respectively.  This  property  expresses  "factual" 
information  about  the  subject  and  thus  is  termed  a 
factual  property. 

Factual  information  alone  is  clearly  not 

adequate  for  requirements  modeling.  We  have 
introduced  classes  (and  metaclasses)  for  the 

purpose  of  defining  and  describing  collections  of 
entities  that,  presumably,  are  grouped  together 
because  some  uniform  conditions  hold  over  all  of 
them.  What  is  needed  is  a  facility  for  specifying 
generic  information  that  pertains  to  each  of  the 
instances  of  a  class  (or  metaclass).  For  example, 
we  may  want  to  specify  for  the  class  PERSON  that 
each  person  has  an  age  which  is  an  AGE_VALUE.  The 
RMF  feature  used  for  this  is  called  a  definitional 
property.  The  triple 

(PERSON,  age,  AG£_VALUE ) 

is  used  for  now  to  represent  the  above  information. 
The  three  components  are  again  called  subject, 
attribute,  and  value.  (Note  that  we  have  used 
angular  brackets  and  parentheses  to  distinguish  the 
tvo  kinds  of  properties.) 

It  may  be  helpful  to  think  of  a  definitional 
property  as  defining  a  function,  e.g., 
age:  PERSON  — ->  AGE_VALUE 

whose  domain  is  the  property  subject  and  whose 
range  in  the  property  value.  Evaluation  of  the 
function  for  an  instance  of  the  domain  results  in  a 
corresponding  factual  property,  e.g. 

age( john-smith)  =  23. 

The  most  important  point  to  notice  here  is  that 
23  is,  and  must  be,  an  instance  of  AGE_VALUE.  In 
general,  for  a  class  C  with  definitional  property 
( C.a , V) ,  for  every  instance  x  of  C,  it  must  be  the 
case  that  a(x)  is  an  instance  of  V.  The  close 
correspondence  between  definitional  and  factual 
properties  is  called*  the  property  Induction 
principle.  The  principle  requires  as  well  that 
every  factual  property  of  an  object  be  induced  by  a 
definitional  property  of  some  containing  class. 


It  is  useful  to  be  able  to  associate  factual 
properties  to  classes  as  well  as  to  tokens.  For 
example,  the  information 

"the  average  age  of  persons  is  21" 

"the  number  of  nurses  is  200" 
should  be  considered  factual  rather  than 
definitional,  that  is 

<PERS0N,  average-age,  21> 

<NURSE,  cardinality,  200> 

since  the  information  pertains  directly  to  the 
subjects  rather  than  to  their  instances. 

The  inclusion  of  metaclasses  in  the  framework 
allows  the  property  induction  principle  to  be 
extended.  In  order  to  allow  the  specification  of 
the  above  two  factual  properties,  it  would  be 
necessary  to  have  also  specified  definitional 
properties  that  induce  them,  such  as 

(PERS0N_CLASS,  average-age,  AG£_VALUE ) 
(PERSON_CLASS,  cardinality,  NUMBER). 

At  this  point  in  the  paper,  we  have  described 
how  two  abstraction  principles,  classification  and 
aggregation,  are  incorporated  into  the  framework. 
The  classification  abstraction  is  supported  by  the 
"instance  of"  relationship  between  tokens  and 
classes  and  between  classes  and  metaclasses;  we 
consider  the  three  levels  to  be  adequate  for  most 
modeling  purposes.  The  grouping  of  all  the 
properties  of  an  entity  relates  the  entity  to  other 
entities  which  may  in  turn  be  the  subjects  of  other 
properties;  this  supports  the  aggregation 
abstraction.  The  property  induction  principle 
relates  these  two  dimensions  in  a  coherent  way. 

3 .3  Objects ,  activities,  assertions 

A3  implied  by  the  examples  above,  objects 
represent  the  "things"  in  the  world,  such  as 
persons,  numbers,  equipment,  documents,  etc. 

PERSON_CLASS  PATIENT 

association  ward:  HOSPITAL_UARD, 

primary-physician:  PHYSICIAN, 
consulting-physician:  PHYSICIAN, 
inserted-by  register:  AD1IT_PATIENT, 
initially  phys-ward?: 

primary-physician. specialty  t  ward 

=>  consulting-physician .specialty  =  ward, 
updated-by  transfer: 

TRANSFER (self ,new-ward :WARD) , 
removed-by  release:  RELEASE( self) , 
end  {PATIENT} 

F igure  2 

Figure  1  is  a  description  of  the  object  class 
PATIENT,  giving  general  information  about  patients 
for  a  particular  hospital.  The  class  is  defined  to 
be  an  instance  of  the  metaclass  PERSON_CLASS  and 
has  a  number  of  definitional  properties  consisting 
of 

<attrlbute>  :  <value> 

pairs  and  grouped  into  property  categories  such  as 
association ,  inserted-by,  and  initially. 

The  properties  of  PATIENT  relate  each  patient  to 
a  ward  and  two  physicians,  one  primary  and  the 
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other  consulting.  Patients  are  "created"  through  an 
ADMIT  PATIENT  activity,  updated  through  a  TRANSFER 
activity  and  removed  through  a  RELEASE  activity. 
When  a  patient  is  first  created,  it  must  be  the 
case  that  if  the  ward  to  which  a  patient  is 
assigned  is  not  the  specialty  of  his/her  primary 
(care)  physician,  then  it  must  be  the  specialty  of 
the  consulting  physician. 


ACTI VIT Y_C LASS  ADMIT_PATIENT 
input  p:  PERSON, 
control  w:  WARD, 

phys,  consul  ting-phys  :  PHYSICIAN, 
output  pt :  PATIENT, 
triggered-by  a  1 :  ARRIVAL(p) , 

precond i tion  already-in?:  NOT  INST( p , PATIENT ) , 
room-left? : 

PATIENT. cardinality  < 

PATIENT_MAX, 

postcondition  admitted?:  IN_HOSPITAL(p) , 
part  check-id:  CHECK_ID(p) , 

put :  CHOOSE_WARD(w,phys .consul ting-phys) , 
admit:  INSERT(p, PATIENT) . 

increment:  INC  REM  ENT (PAT IE NT. card inal ity) , 
urinalysis:  PERFORMJJRINALYSIS(p) , 
blood-count :  PERF0RM_BL00D_C0UNT ( p) , 
blood-pressur e  :  PERFORM_BLOOD_PRESSURE(p) , 
temp:  TAKE_TEMP(p) , 
end  (ACMIT_PATIENT) 

F igure  2 

In  Figure  2,  we  present  a  definition  of  the 
activity  class  ADMIT_PATIENT.  It  consists  of  one 
input  property,  a  person,  and  one  output  property, 
a  patient;  also,  it  has  three  control  properties,  a 
ward  and  two  physicians.  The  activity  is  triggered 
by  instances  of  the  assertion  class,  ARRIVAL,  which 
is  instantiated  each  time  a  person  arrives  at  the 
hospital  for  admission. 

ADMIT_PATIENT  also  includes  two  preconditions 
(already-in?  and  room-left?)  and  a  postcondition 
(admitted?)  that  must  be  true  before  and  after  tjie 
activity,  respectively.  The  preconditions  assert 
that  the  person  must  not  be  already  a  patient  of 
the  hospital,  and  also  that  the  number  of  patients 
(PATIENT. cardinality)  is  less  than  the  hospital 
capacity  ( PATIENT_MAX) .  The  postcondition  asserts 
that  the  person  has  indeed  been  admitted  once  the 
ADMIT_PATIENT  activity  is  over. 

Finally,  the  "body"  of  ADMIT_PATIENT  is  defined 
by  several  part  properties  which  specify  the 
components  of  an  ADMIT_PATIENT  activity.  The 
components  involve  checking  the  person's  ID, 
choosing  a  ward  and  assigning  a  primary  care  and  a 
consulting  physician,  inserting  the  person  into  the 
PATIENT  class  and  incrementing  the  cardinality 
property  of  PATIENT;  also,  some  tests  are  performed 
(urinalysis,  blood-count,  blood-pressure,  and 
temperature) . 

We  present  an  example  of  an  assertion  class  in 
Figure  3  to  underscore  the  uniform  treatment  of  the 
entity  categories.  The  assertion  class  IN_HOSPITAL 
has  one  argument,  a  patient,  and  asserts  through 
its  part  property  that  the  person  is  now  physically 


ASSERTION^ LASS  IN_HOSPITAL 
argument  p:  PERSON, 
part  patient?:  INST (p, PATIENT ) , 

present?:  PHY3ICALLY_PRESENT(p) , 
end  { IN_HOSPITAL} 

Figure 3 

present  at  the  hospital. 

i 

We  close  this  section  by  giving  a  definition  of 
the  metaclass  PERS0N_C LASS,  in  Figure  4.  As  stated 
in  the  previous  section,  PATIENT  is  entitled  to  a 
cardinality  property  only  because  it  is  an  instance 
of  this  metaclass.  Note  that  METACLASS  is  a 
built-in  metametaclass  that  has  all  metaclasses  as 
instances.  Two  other  built-in  metametaclasses  have 
been  found  useful  because  they  allow  references  to 
all  entities  and  to  all  generic  entities, 
respectively:  (i)  ENTITY,  which  has  as  instances 
all  entities  in  a  specification,  including  itself, 
and  (ii)  CLASS,  which  has  as  instances  all  classes, 
metaclasses,  and  the  three  built-in  metametaclasses 
METACLASS,  CLASS,  and  ENTITY. 


METACLASS  PERSON_CLASS 

association  average-age:  AGE_VALUE, 
cardinality:  NUMBER, 
end  ( PERSON_CLASS) 

F igure  *4 


3  .4  Generalization 

To  support  the  generalization  abstraction 
mechanism,  a  new  relationship,  subclass,  is  offered 
which  can  be  declared  between  two  classes  or  two 
metaclasses.  For  example,  suppose  PERSON  has  been 
defined  as  shown  in  Figure  5.  (Note:  The  part 
properties  of  a  data  class  instance  do  not  change 
values,  while  the  association  properties  do.)  Then, 
changing  the  first  line  of  the  definition  of 
PATIENT  (Figure  1)  to 

PERSON_CLASS  PATIENT  subclass  of  PERSON 
makes  PATIENT  a  subclass  or  specialization  of 
PERSON  and  PERSON  a  generalization  of  PATIENT. 


PERSON_CLASS  PERSON 

part  name:  PERSON_NAME, 

sin:  SOCIAL_INSURANCEJI, 
ohip:  ONTARIO_INSJf , 
association  address:  ADDRESS, 

age:  ACE_VALUE, 

end  {PERSON} 

F igure  5 

What  does  it  mean  to  say  that  PATIENT  is  a 
specialization  of  PERSON?  Well,  for  one  thing  we 
expect  that  every  instance  of  PATIENT  is,  under  all 
circumstances,  also  an  instance  of  PERSON.  Indeed, 
the  semantics  of  becoming  an  instance  of  a  class 
include  becoming  an  instance  of  all  of  the 


105 


generalizations  of  the  class.  Conversely,  when  an 
object  ceases  to  be  an  instance  of  a  class,  it  also 
ceases  to  be  an  instance  of  its  specializations. 

Another  aspect  of  specialization  concerns  the 
definitional  properties  of  the  two  classes 
involved.  All  definitional  properties  of  PERSON  are 
inherited  by  PATIENT;  so,  by  virtue  of  being 
declared  a  specialization  of  PERSON,  PATIENT  has, 
in  addition  to  the  properties  specified  in  Figure 
1,  also  a  name,  a  social  insurance  number,  an 
address,  etc. 

Property  inheritance  allows  for  economy  of 
expression  in  a  specification  because  a 
definitional  property  need  only  be  mentioned  once 
for  the  most  general  class  to  which  it  is 
applicable.  Inheritance  also  serves  as  a  memory 
aid,  since  knowing  that  a  class  is  a  subclass  of 
another  allows  one  to  concentrate  on  the  additional 
information  needed  to  describe  the  subclass. 


PERSON 


(PHYSICIAN) 


(NURSE) 


CHILD_PATIENT  /  MEDICAL  PATIENT 

SURGICAL  PATIENT 


\ 

(SURGICAL  CHILD  PATIENT) 


TRANS PLANT_SUR(ZRY_PATIENT 
(The  classes  in  parentheses  have  not  been  defined.) 


PERSON_CLASS  CHILD  subclass  of  PERSON 
association  age:  CHILD_AGE_VALUE, 
guardian:  PERSON, 
invariant  guardian. age  >  30, 
end  (CHILD) 

PERSON_CLASS  SURCICAL_PATIENT  subclass  of  PATIENT 
association  blood-type:  BLOOD  TYPE, 
surgery:  SUR CERY_T YPE, 
end  (SUR  GICAL_PATIENT) 

PERSON_CLASS  TRANSPLANT J3UR GER Y_PATIENT 

subclass  of  SUR GICAL_PATI£NT 
association  donor:  PERSON, 
end  (TRANSPLANTS UR  GERY_PATIENT) 

PERS0N_CLASS  CHILD_PATIENT 

subclass  of  CHILD,  PATIENT 
association  nurse:  NURSE, 
end  (CHILD  PATIENT) 


Figure  6 


To  illustrate  the  importance  of  generalization, 
suppose  we  have  already  defined  the  class  PERSON 
and  its  specialization  PATIENT.  A  number  of  other 
object  classes  are  also  relevant  for  our  hospital 
example.  CHILD  specializes  PERSON  by  restricting 
its  age  property  to  allow  only  values  in 
CHILD_ACE_VALUE.  SURGICAL_PATIENT  as  well  as 
TRANSPLANT_SUR(ERY_PATIENT  are  specializations  of 
PATIENT.  CHILD_PATIENT  gives  an  example  of  a  class 
that  has  more  than  one  immediate  generalization. 
Figure  6  includes  the  definitions  of  all  these 
object  classes  while  Figure  7  summarizes  the 
subclass  relation  for  the  object  classes  defined  so 
far . 


F igure  7 


assigns  a  nurse  to  the  child  patient  in  addition  to 
all  the  things  done  for  other  patients.  The  second 
specializes  ADMIT_PATIENT  for  surgical  patients 
where  a  blood  test  is  done  for  possible 
transfusion.  AEMIT_SUR GICAL_CHILD_PATIENT,  the 
third,  is  a  specialization  of  the  previous  two 
activities  and  therefore  inherits  all  their 
definitional  properties;  in  addition,  it  has  a 
definitional  property  of  its  own  which  obtains 
permission  for  surgery  from  the  child's  guardian. 


activity_class  admit_child_patient 

subclass  of  ACM IT_PATIENT 

input  p:  CHILD, 

control  n:  NURSE, 

output  pt :  CHILD_PATIENT , 

part  find-nurse:  FIND  NURSE(n.w), 

admit:  INSERKP  .  CHILD_PATIENT) , 
end  {ACMIT_CHILD_PATIENT} 

ACTIVITY_CLASS  ACMIT_SURCICAL_PATIENT 

subclass  of  ADM IT_PATIENT 
output  pt :  SUR GICAL_PATIENT, 
triggered-by  al:  ARRIVAL(p) 

AND  SURCERY  NEEDED(p)  , 
part  blood-typing:  PERFORM  BLOOD  TYPING, 
end  { ACMIT_SURGICAL_PATIENT) 

activity_class  admit_surgical_child_patient 

subclass  of  ADMIT  CHILD  PATIENT, 

ACMIT_SURGICAL_PATIENT 
part  obtain-permission  : 

OBTA IN_PERM ISSION (p ,p .guard i an) , 
end  (ADMIT  SURGICAL  CHILD  PATIENT) 


Figure  8 


Specialization  opens  the  door  to  a  form  of 
stepwise  refinement  that  is  based  on  the 
introduction  of  detail  for  special  cases.  Moreover, 
this  form  of  refinement  is  not  applicable  only  to 
object  classes.  Consider,  for  example,  some 
specializations  of  ADMIT_PATIENT,  as  shown  in 
Figure  8.  The  first,  ADMIT  CHILD  PATIENT,  simply 


Note  that  redefinitions  of  definitional 
properties  must  be  consistent  with  the  properties 
they  replace.  For  example,  the  value  of  the  age 
property  of  child,  CHILD_ACE_VALUE,  must  be  a 
specialization  of  AGE_VALUE  (Figures  5  and  6). 
Similarly,  the  redefinitions  of  properties  such  as 
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p  and  pt  in  ADMIT  CHILD_PATIENT  are  all  consistent 
with  the  properties  of  ADM IT_PATIENT  they  replace 
(Figures  2  and  8).  An  interesting  application  of 
this  consistency  rule  involves  properties  whose 
value  is  an  assertion  class  such  as  the  al  property 
of  ADM IT_PATIENT  (Figure  2).  For  ADMIT_PATIENT  the 
value  of  al  is  the  assertion  ARRIVAL(p)  ,  while  for 
ADM IT_SUR  GICA  L_PATIENT  (Figure  8)  the  value  of  al 
is  the  stronger  assertion  ARRIVAL(p)  AND 
SURCERY_NEEDED(p)  . 

Specialization  can  also  be  used  to  structure 
assertion  class  definitions.  The  IN_HOSPITAL 
assertion  class,  for  instance,  can  be  specialized 
by  specializing  its  arguments,  by  adding  conjuncts 
(parts),  or  even  by  redefining  some  of  its  parts. 
Thus,  for  child  patients,  IN_H0SPITAL  might  be 
specialized  (see  Figure  9)  to  check  that  the 
patient  is  in  a  ward  accompanied  by  a  nurse. 


ASSERTION_C  LASS  CHILD_IN_HOSPITAL 

subclass  of  IN_HOSPITAL 
argument  p:  CHILD 
part  in-ward?:  IN  WARD(p)  , 

with-nurse?:  WITH_NURSE(p)  , 
end  {CHILD_IN_HOSPITAL} 

F igure  9 


We  close  this  section  by  pointing  out  that 
metaclasses  (and  metametaclasses)  are  also 
organized  into  specialization  hierarchies,  as 
suggested  in  Figure  10. 

tk  RELATED  WORK 

In  this  section  we  argue  the  advantages  and 
utility  of  the  requirements  modeling  framework.  We 
justify  our  choices  of  specification  concepts 
(namely,  the  entity  categories,  property 
categories,  and  abstraction  mechanisms)  by 
demonstrating  that  they  are  based  on  a  consensus 
over  a  wide  variety  of  specification  and  modeling 
experience.  Moreover,  our  framework  subsumes  the 
features  of  important  requirements  languages. 

4 . 1  Related  requirements  languages 

The  achievement  of  a  highly  uniform  framework  is 
a  goal  partly  inspired  by  Softech's  SADT  [22].  SADT 
offers  data  and  activity  concepts  and  uses  the  same 
graphical  box  and  arrow  notation  for  describing 
both.  A  data  concept  is  defined  (decomposed)  in  a 
diagram  showing  the  data  subparts  as  boxes 
interconnected  by  arrows  representing  activities. 
Activity  objects  are  defined  by  a  "dual"  kind  of 
diagram  in  which  activity  boxes  are  interconnected 
by  data  arrows . 

RMF's  "object"  and  "activity"  correspond  to 
SADT's  data  and  activity,  and  RMF  adds  a  third, 
complementary  specification  unit,  assertions,  to 
facilitate  the  explicit  specification  of 
information  which  in  an  SADT  model  would  usually  be 
specified  in  accompanying  natural  language  text.  In 
addition,  RMF  makes  explicit  the  use  of  abstraction 
principles,  which  we  believe  the  modeler  tends  to 


ENTITY 

A 


CLASS 

A 


METACLASS 

Generalization  hierarchy  for  metametaclasses 


ANY  CLASS 


OB JECT_CLASS  ASSERTION_C LASS  ACTIVITY_CLASS 

A 


PERSON  CUSS 


Generalization  hierarchy  for  metaclasses 


F igure  10 

use  implicitly  during  requirements  modeling  to 
interpret  an  SADT  model. 

By  way  of  analogy  with  SADT’s  three  arrow  types 
(Input/Control/Output)  between  data  and  activity, 
we  have  considered  various  kinds  of  relationships 
between  RMF's  three  entity  categories  as  candidate 
property  categories.  RMF  property  categories  offer 
some  explicit  interpretations  for  the  relationships 
represented  by  the  SADT  arrows. 

A  second  language  which  supports  requirements 
specifications  is  PSL  (Problem  Statement  Language) 
[26]  which  was  the  first  automated  facility  for 
storing  and  managing  "problem  statements".  A 
problem  statement  is  a  functional  specification  in 
the  form  of  a  data-oriented  target-system 
description.  Such  a  functional  specification 
differs  from  a  requirements  model  by  making  design 
decisions  that  determine  system  boundaries.  "System 
structures"  are  distinguished  from  internal  process 
and  data  structures  that  are  used  to  capture 
characteristics  of  the  target  system. 

PSL  does  offer  a  number  of  useful  relationship 
types  which  support  our  choice  of  property 
categories.  These  relationship  types  fall  into 
several  groups,  some  of  which  are  roughly  as 
follows:  (i)  System  Flow  —  a  process  may  receive 
input  data  and  generate  output  data;  (ii)  Data 
Derivation  —  a  process  may  use  data  to  derive  or 
update  data;  (iii)  System  Dynamics  —  events  occur 
when  a  condition  becomes  true/false  or  upon 
inception/termination  of  a  process,  and  an  event 
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may  trigger  a  process.  (Cost  of  the  words  used  in 
(i)-(iii)  above  stem  from  the  PSL  vocabulary  of 
keywords.)  Many  of  the  RMF  property  categories 
coincide  with  these  useful  PSL  relationships ,  and 
all  PSL  statements  can  be  expressed  in  RMF. 
Moreover,  by  applying  the  notions  of  symmetry 
inspired  by  SADT,  we  discovered  other  useful 
relationships.  The  resultant  symmetry  among  RMF 
relationships  is  what  permits  a  rather  concise 
formalization  of  RMF. 

RSL  (the  Requirement  Statement  Language)  is  part 
of  perhaps  the  most  comprehensive  project  to  date 
to  examine  and  improve  the  state  of  the  art  [2], 
The  language  itself  is  a  functional  specification 
language  oriented  to  real-time  systems  such  as 
command  and  control  systems.  It  offers  a  set  of 
relationships  similar  to  but  somewhat  more  concise 
than  PSL,  with  the  major  differences  being  due  to 
its  real-time-system  orientation.  RSL  allows 
members  of  an  "entity-class"  to  be  members  of  one 
subordinate  "entity-type".  However,  there  is  no 
enforced  relationship  between  data  associated  with 
an  entity-class  and  data  associated  with  its 
entity-types.  This  is  a  simple  form  of  a  kind  of 
subclass  relationship  and  an  example  of  where  RMF 
attempts  to  provide  a  more  general  modeling 
facil ity. 

RSL  also  offers  a  general  graphical  control  flow 
specification  feature  called  an  Rnet.  The  Rnets 
describe  partial  ordering  among  processes,  have  a 
"subnet"  notation  for  suppressing  details,  and 
provide  for  control  flow  events  to  "trigger"  other 
events.  Thus,  Rnets  provide  a  formal  structure  for 
specifying  information  about  events  and  conditions. 
RMF  uses  its  assertion  objects  in  several  roles 
(property  categories)  to  express  these  kinds  of 
information . 

4 .2  Semantic  database  modeling  and 
knowledge  representation 

As  illustrated  by  a  recent  workshop  [9], 
researchers  in  several  areas  of  Computer  Science, 
notably  in  Artificial  Intelligence  (AI)  and  Data 
Base  Management,  have  independently  concluded  that 
real-world  modeling  is  of  paramount  importance  for 
building  computer  systems,  albeit  each  of  these 
areas  has  goals  and  perspectives  that  differ 
somewhat  from  those  of  Software  Engineering. 

Oie  of  the  central  themes  of  AI  is  the 
Representation  of  Knowledge  [8],  which  has  been 
found  indispensable  for  simulating  human  behavior 
(e.g.,  natural  language  understanding)  and  for 
building  "expert"  systems.  Semantic  Networks  (see 
[7]  for  a  review)  have  been  used  in  AI  for  over  a 
decade  as  ways  of  representing  and  especially 
organizing  world  knowledge  through  the  notions  of 
"nodes"  (for  entities)  and  "links"  of  various  types 
(indicating  types  of  relationships).  The 
abstraction  principles  used  in  RMF  are  directly 
supported  by  many  -  versions  of  semantic  networks. 
Generalization,  under  the  heading  of  "I3-A 
hierarchies",  has  received  considerable  attention 
in  AI.  The  basic  organization  of  RMF,  modulo  the 
assertion  classes,  has  been  directly  influenced  by 
this  AI  research,  especially  PSN  [15].  The 


popularity  of  semantic  networks  in  AI  provides 
independent  motivation  to  our  work  and  further 
confidence  in  its  appropriateness. 

In  the  field  of  Data  Bases,  semantic  or 
conceptual  models  (see  [17]  for  an  overview)  have 
gained  increasing  popularity  as  ways  of  describing 
database  schemata  which  enhance  comprehensibility, 
and  hence  facilitate  database  design  and 
maintenance.  In  fact,  the  terms  "aggregation"  and 
"generalization"  were  introduced  in  [24]  in  the 
context  of  database  design.  Semantic  data  models, 
however,  concentrate  by  and  large  on  the 
specification  of  objects  (  i  .e  .  "data").  Increasing 
attention  is  being  given  to  specification  of 
constraints  on  the  validity  of  data,  and  to  a 
lesser  extent  to  specification  of  activities.  As 
discussed  in  [18],  extending  current  modeling 
capabilities  with  respect  to  logical  information 
and  activities  is  essential  to  improving  on  current 
semantic  models. 

The  Taxis  model  [19]  is  one  of  the  few  semantic 
data  models  that  extends  the  use  of  abstraction 
facilities  beyond  data.  In  fact.  Taxis  uses 
aggregration ,  generalization,  and  classification 
for  organizing  relations,  transactions,  exceptions, 
and  "scripts"  for  user  interfaces  [5].  (Xir  current 
work  on  requirements  modeling  has  been  carried  out 
within  the  framework  of  Taxis,  with  the  purpose  of 
providing  a  higher-level  specification  language 
that  would  be  appropriate  for  expressing,  as  a 
special  case  of  requirements  specification. 
Corporate  Requirements,  the  need  for  which  is 
described  in  [16].  We  expect,  as  well',  that  the  use 
of  the  same  abstraction  principles  in  the  RMF  as  in 
Taxis  will  enhance  the  utility  of  the  RMF  for 
information  system  design  using  Taxis. 

(Xir  approach  is  consistent  with  views  in  [29] 
that  strongly  advocate  the  use  of  semantic  modeling 
in  Software  Engineering.  Among  other  semantic 
modeling  work  relevant  to  requirements  modeling,  we 
note  the  work  reported  in  [10], [23],  and  [25], 
which  similarly  emphasize  the  importance  of 
modeling  real-world  phenomena  as  a  system  analysis 
approach.  [23]  presents  a  conceptual  modeling 
approach  based  on  semantic  networks  and  uses  an 
IS-A  hierarchy  for  organizing  concepts.  Another 
language  similar  in  spirit  to  RMF  is  presented  in 
[27];  it  is  based  on  variations  of  the  same 
abstraction  principles  provided  by  RMF. 

We  point  out  that  although  RMF  has  its  roots  in 
previous  work  in  AI  and  Data  Bases,  it  provides 
novel  capabilities.  These  include  assertions  as 
another  category  of  entities,  property  categories 
for  defining  (abbreviating)  pertinent  information 
types,  and  the  uniform  application  of  the 
abstraction  principles  to  all  object  categories. 

Of  course,  any  descriptive  framework  based  on 
classes  must  acknowledge  Simula  as  a  precursor.  We 
also  acknowledge  Smalltalk  (see  Byte  magazine, 
August  1981)  as  having  influenced  our  basic 
framework;  both  RMF  and  Smalltalk  are 
"object-oriented"  (this  is  a  different  sense  of  the 
word  "object"  than  used  elsewhere  in  this  paper)  in 
that  each  specification  unit  encapsulates  the 
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description  of  some  conceptual  entity.  Just  how  the 
ideas  of  these  (as  well  as  of  certain  other) 
programming  languages  compare  to  our  framework  is 
quite  interesting,  but  such  a  discussion  is  beyond 
the  scope  of  this  paper. 

5^  AN  ASSESSMENT  OF  RMF 

5 . 1  Concerning  methodology 

At  the  heart  of  many  software  development 
methodologies  lies  one  or  more  abstraction 
mechanisms,  which  allow  us  to  ignore  details  at 
some  level  ,  plus  a  refinement  principle  which 
provides  for  the  guided  and  gradual  reintroduction 
of  details  across  the  abstraction  dimension.  The 
aggregation  abstraction  forms  the  core  of  software 
design  methodologies  such  as  "stepwise  refinement" 
(e.g.,  [28]).  Similarly,  the  "implementation" 
dimension  is  the  basis  for  the  abstract  machine  and 
abstract  data  type  related  methodologies  (e.g., 
[20]).  The  generalization  abstraction  has  not  been 
exploited  in  Software  Engineering  as  have  the  other 
dimensions.  Yet,  it  is  our  contention  that  it  is  an 
invaluable  organizational  tool  for  system 
description  in  general,  and  for  requirements 
modeling  in  particular. 

The  main  idea  of  specification  guided  by 
generalization  is  that  a  model  can  be  constructed 
by  modeling  first  the  most  general  classes,  and 
then  proceeding  to  more  specialized  classes.  For 
example,  in  modeling  a  hospital  world,  one  might 
consider  first  the  concepts  of  patient,  doctor, 
admission,  treatment,  etc.  Later,  the  modeler  can 
differentiate  between  child  patients,  heart 
patients,  internists  and  surgeons,  surgical  and 
medical  treatments,  etc.  At  each  step,  only  the 
information  (properties)  appropriate  to  that  level 
are  specified.  (We  do  not  rule  out  the  need  to 
iterate,  i .e .  to  go  back  to  revise  previous  level.) 

Generalization  is  the  appropriate  principle  to 
exploit  when  the  difficulty  of  modeling  is  due  to  a 
large  number  of  details  rather  than  due  to  the 
complexity  of  the  system/world;  a  hierarchy  of 
classes  organized  along  this  dimension  provides  a 
convenient  structure  for  distributing  information 
(expressed  uniformly  as  properties  in  RMF)  and 
associating  it  where  it  most  naturally  belongs. 
Such  stepwise  refinement  by  specialization  [6]  is 
orthogonal  and  complementary  to  the  more  usual 
"stepwise  refinement  by  aggregation",  whose  main 
effect  is  to  decompose  complex  situations  into  a 
number  of  less  complex  ones.  Both  kinds  of 
refinement  are  orthogonal  and  complementary  to  a 
third  dimension,  the  progression  from 
"world-or iented"  specifications  to  specifications 
of  a  more  and  more  completely  implemented  system. 

5 .2  An  underlying  model  fcr  RMF 


The  underlying  model  is  based  on  a  logic 
involving  time,  in  which  we  can  make  assertions 
about  the  properties  that  any  entity  has  with 
respect  to  special  time  entities  called 
"situations".  At  any  moment  in  time  (i.e.,  in  any 
situation)  ,  the  "world"  Jseing  described  is 
characterized  essentially  by  knowledge  of  what 
entities  are  instances  of  what  classes 
(metaclasses, etc)  .  Object  classes  have  as  instances 
those  entities  that  are  deemed  to  exist  (i.e.  to  be 
relevant)  at  that  time;  an  activity  class  has  as 
instances  activities  that  are  occurring,  or  active, 
at  that  time;  an  assertion  class  is  considered  to 
have  as  instances  assertions  that  are  true  at  that 
moment . 


Each  period  when  an  entity  belongs  to  a  class  is 
characterized  by  an  initial  (insertion)  time  and  a 
final  (removal)  time.  During  this  period,  the 
object  is  expected  to  have  the  factual  properties 
induced  by  the  definitional  properties  of  the 
class.  Thus,  a  description  can,  in  fact,  be 
expressed  in  the  form  of  axioms  defining  the 
meaning  of  the  " instance-of"  and  "subclass-of" 
relations. 


Property  categories  can  now  be  explained  as 
designating  axiom  schemata,  which  provide  templates 
for  the  axioms  that  represent  properties  in  the 
respective  property  categories.  For  example,  an 
initial  condition  for  an  object  class  expresses  a 
condition  that  is  true  for  each  object  that  enters 
the  class  at  its  time  of  entry.  This  is  captured  in 
our  logic  by  an  axiom  defining  the  property 
category  initially  as  a  predicate  over  properties 
of  objects,  involving  the  object,  an  assertion,  and 
the  insertion  time  (situation). 


The  axiom  schemata  give  precise  meaning  to 
property  categories,  so  that  property  categories 
can  now  be  seen  as  abbreviations  for  commonly 
encountered  restrictions  on  properties.  The  way  is 
open  for  users  of  RMF  to  extend  the  list  of 
property  categories  as  dictated  by  the  exigencies 
of  special  domains  of  discourse. 


Such  an  underlying  model  relates  an  RMF 
description  to  formal  semantics  which  will  be 
useful  for  developing  theoretical  and  pragmatic 
tools  supporting  the  consistency  of  descriptions. 


The  big  advantage  of  descriptions  based  on  logic 
with  time  is  that  the  descriptions  are  quite 
declarative.  One  has  a  view  of  the  entire  time-line 
(more  precisely,  over  all  relevant  situations). 
Information  that  is  typically  represented  by 
control  flow  specifications  in  other  models  is 
subsumed  here  by  logical  formulae  involving 
situations  plus  information  about  the  relationships 
between  situations,  which  impose  a  partial  (time) 
ordering  on  situations. 


Since  descriptive  languages  are  notoriously 
ambiguous,  we  are  working  on  a  detailed  formal 
definition  of  a  language  based  on  RMF.  We  limit  our 
discussion  to  an  outline  of  the  underlying 
formalism  and  the  advantages  of  such  a  definition. 
(For  a  detailed  presentation,  see  [13].) 


5 . 3  Uniformity 

There  are  several  senses  in  which  the  framework 
exhibits  high  uniformity.  Given  the  "instance-of" 
relation  and  initial/ final  situations,  it  is 
straight-forward  to  define  primitive  insert/remove 
actions  which  add  and  delete  entities  from  classes. 
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These  actions  are  intuitive  for  objects.  Applying 
the  same  ideas  to  assertions  and  activities,  we 
model  activation/termination  of  activities,  and 
becoming  true/false  for  assertions  as  the 
insertion/removal  of  entities  of  the  respective 
categories. 

The  imposition  of  generalization  hierarchies  on 
each  object  category  results  in  an  interesting 
perspective  as  well.  For  objects,  one  can  view  an 
entity  as  starting  out  in  a  particular  class  and 
moving  around  on  the  hierarchy  throughout  its 
lifetime.  For  example,  a  person  could  become  a 
child  patient,  later  (by  virtue  of  growing  older) 
an  adult  patient,  and  so  on.  An  activity's  behavior 
can  be  viewed  at  several  levels  of  generalization 
depending  on  what  aspects  of  its  participants 
(inputs,  outputs,  controls)  its  effects 
(preconditions,  postconditions,  and  conditions  it 
maintains),  and  components  (parts)  are  associated 
at  each  level.  For  assertions,  the  imposition  of  a 
generalization  hierarchy  is  particularly  novel  and 
Interesting.  Assertions  viewed  as  entities  are 
propositions  whose  (semantic)  interpretation  (i.e. 
specification  of  under  what  circumstances  they  are 
true)  depends  on  the  classes  in  which  they  reside. 
Clearly,  in  RMF,  one  assertion  class  is  a 
specialization  of  another  only  if  the  former 
logically  implies  the  latter.  Property  inheritance 
ensures  consistency  between  assertion  classes. 
Thus,  the  generalization  abstraction  organizes 
assertions  according  to  both  their  arguments  and 
their  assertional  import. 

Concerning  property  categories,  it  turns  out 
that  most  of  those  we  have  found  useful  can  be 
defined  in  terms  of  a  small  number  of  items  of 
information.  There  are  many  forms  of  uniformity 
(symmetry,  duality)  present.  Many  of  the  axiom 
schemata  for  the  property  categories  are  virtually 
identical  except  for,  e.g.,  the  entity  category  of 
the  property  subject  or  value,  whether  the  insert 
vs.  remove  time  is  mentioned,  the  order  of  binding 
through  quantifiers  over  classes  and  time,  etc.  As 
a  simple  example,  initially ,  precondition ,  and 
lnserted-by  property  categories  all  assert 
something  about  the  insert  time  of  the  property 
subject,  while  postcondition  and  removed-by 
property  categories  all  assert  something  about  the 
remove  time  of  the  property  subject.  Parts  and 
constraints  are  examples  of  property  categories  of 
objects  that  pertain  to  the  entire  instance 
interval  and,  in  fact,  their  schemata  are  identical 
except  for  the  fact  that  a  part  associates  an 
object  while  a  constraint  associates  an  assertion. 

5  Conclusion 
— - - - 

We  do  not  claim  to  have  invented  the  abstraction 
mechanisms  combined  in  the  framework;  rather,  we 
have  argued  that  they  are  independently  motivated 
by  several  modeling  endeavors.  What  we  HAVE  done  is 
to  combine  them  in  a  simple,  constructive  way,  and 
we  have  explained  .some  principles  of  interaction 
and  their  appropriateness  for  requirements 

specification . 

When  we  say  RMF  "captures  more  world  knowledge" 
than  other  specification  techniques,  we  are 


referring  to  the  semantic  information  that  is 
conveyed  by  the  three  concept  types,  the  kinds  of 
relationships  provided,  and  the  use  of  assertions 
in  roles  where  English  is  used  in  other  techniques. 
More  specifically,  we  mean  that  RMF  captures  world 
knowledge  more  formally  (in  the  same  sense  that 
Predicate  Calculus  is  more  formal  than  English), 
and  without  resorting  to  more  implementation 
oriented  concepts. 

We  wish  to  stress  that  successful  modeling 
depends  not  just  on  how  one  represents  knowledge 
but  on  how  one  structures  or  organizes  it.  For 
example,  Predicate  Calculus  would  be  adequate,  from 
the  point  of  view  of  ex pr essib il ity ,  for 
representing  knowledge;  however,  it  does  not 
provide  good  structuring  facilities.  We  have  argued 
in  this  paper  that  structuring/organizing  a  model 
should  be  based  on  useful  abstraction  mechanisms 
such  as  those  offered  by  RMF. 

In  this  paper  we  have  bypassed  discussing  the 
important  task  of  how  the  relevant  terms  of  the 
domain  of  discourse  are  initially  identified  and 
recorded.  We  propose  that  this  task  should  be  done 
separately  and  thoroughly  prior  to  RMF  modeling.  We 
intend  to  use  an  SADT-like  technique  to  set  up  an 
initial  ( "structured" )  lexicon  of  the  terns  whose 
semantic  relationships  are  of  importance  to  the 
model.  Our  current  research  [13]  investigates  the 
connection  between  such  a  lexicon  and  the  RMF 
model:  how  to  proceed  from  the  former  to  the  latter 
and  how  to  maintain  consistency  between  them. 

A  common  problem  with  using  abstractions  is  that 
humans  often  over-abstract  in  an  effort  to 
establish  regularity  in  their  environment;  thus, 
although  at  first  sight  all  patients  admitted  to 
the  hospital  must  have  blood-pressure  taken,  some 
subclasses  such  as  AMPUTEE  may  not,  and  even  the 
most  heartless  hospital  will  not  reject  a  patient 
because  he  doesn't  know  his  health  insurance 
number.  One  aspect  of  our  current  research  concerns 
appropriate  responses  to  such  exceptional 
situations  and  how  exception  specifications  serve 
as  yet  another  abstraction  principle  in  organizing 
large,  detailed  descriptions  [4], 

We  also  believe  (calendar,  clock)  time  to  be 
essential  to  requirements  modeling,  since  many 
requirements  involve  expressing  things  about  time. 
The  time  model  proposed  in  [10]  would  be 
appropriate  and  fits  directly  into  our  framework  as 
object  modeling;  we  would  extend  it  to  activity  and 
assertion  entities  within  RMF. 

Finally,  within  the  Taxis  Project  at  the 
University  of  Toronto,  we  are  applying  the  same 
general  principles  to  different  phases  of  Software 
Engineering,  and  we  believe  they  will  be  a  key 
factor  in  developing  a  unified  approach  to  Software 
Engineering . 
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Abstract 

The  paper  describes  some  aspects  of  a  Requirements  Modeling  Language 
(RML)  which  can  be  used  in  developing  information  systems  RML  is  based  on  the 
idea  that  a  requirements  specification  should  embody  a  conceptual  world  model. 
and  that  the  language  for  expressing  it  should  provide  facilities  for  organizing 
and  abstracting  details,  yet  at  the  same  time  have  qualities  such  as  precision, 
consistency,  and  unambiguity.  In  our  case,  these  qualities  are  obtained  by 
relating  RML  to  a  logic  involving  time  The  translation  to  logic  defines  RML. 
including  novel  aspects  such  as  property  categories,  classes  of  assertions,  and 
the  uniform  application  of  well-known  abstraction  principles  to  data,  event,  and 
assertion  objects  within  a  single  framework  This  demonstrates  that  a  language 
can  offer  highly  structured  and  convenient  mechanisms  for  requirements 
specifications  while  having  solid,  mathematical  underpinnings. 
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1.  Introduction 

The  importance  of  requirements  specifications  as  an  initial  step  of  software 
development  has  been  long  apparent  in  Software  Engineering  ([IEEE-TSE  77]).  Its  relevance 
to  database  design  was  clearly  stated  in  [Lum  80],  where  it  was  called  the  "Corporate 
Requirements  Analysis".  The  general  purpose  of  this  activity  is  to  acquire  information  about 
the  way  in  which  the  enterprise  functions,  as  well  as  its  goals,  in  order  that  the  eventual 
computer  system  (hardware  &  software)  be  able  to  support  its  current  and  projected 
activities 

As  such,  the  apparent  subject  matter  of  the  requirements  specification  is  the  enterprise 
itself1,  not  just  the  data  component  of  the  planned  system  Many  of  the  requirements 
specification  languages  mentioned  in  the  database  literature,  eg,  PSL/PSA[Teichreow  77] 
and  [Shu  83],  restrict  their  attention  exclusively  to  the  characteristics  of  the  data  to  be 
stored,  and  the  operations  on  it  which  are  to  be  supported  by  the  projected  system.  This 
means  that  such  important  information  as  definitions  of  terms,  complex  semantic 
constraints,  goals,  as  well  as  performance  requirements,  cannot  be  stated  uniformly  in  the 
requirements  specification  Important  database  activities  such  as  view  integration  rely  on 
exactly  such  information. 

For  this  reason,  a  fundamental  tenet  of  the  present  work  is  that  the  requirements 
language,  called  RML  in  our  case,  must  be  adequate  to  describe  models  of  the  real  world. 
To  emphasize  the  distinction,  we  note  that  in  world  modeling  the  statement  "an  employee 
may  optionally  have  a  supervisor"  means  that  a  person  may  or  may  not  have  a  supervisor, 
while  in  "system  modeling",  it  would  normally  be  taken  to  mean  that  the  data  may  or  may 
not  be  present;  hence  a  "non-optional"  property  of  a  real  object  (eg.,  location)  may  be 
optional"  in  the  system.  The  ability  to  model  both  the  world  and  the  projected  system 
within  the  same  framework  allows  new  kinds  of  information  to  be  captured  explicitly  in  the 
requirements  (e  g .  requirements  about  the  completeness/incompleteness  of  the  information 
stored  in  the  IS). 

Note  also  that  RML  does  not  provide  primitive  "update"  operators  to  modify  the  model  of 
the  world,  as  do  conventional  data  models,  because  a  requirements  model  is  not  a  model 
of  the  world  at  a  particular  moment  of  time;  instead,  the  model  is  assumed  to  span  a  time 
interval  covering  the  existence  of  the  real-world  portion  which  is  of  interest,  and  the 
designer  is  supposed  to  express  constraints  which  were  obeyed  during  this  interval; 
metaphorically,  the  designer  is  taking  a  post-apocalyptic  view  of  the  world,  and  describes 
semantic  conditions  which  characterized  it. 

In  designing  RML,  we  were  guided  by  a  number  of  general  principles  and  assumptions. 

First,  a  good  modeling  language  should  allow  the  definition  of  objects  in  the  domain  of 
discourse,  the  description  of  changes  (events)  in  the  world,  and  the  statement  of 
constraints  and  assumptions  For  this  reason,  RML  allows  the  description  of  information  in 


i 

which  could  potentially  include  one  or  more  comouter  subsystems 
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tnree  genera!  categories:  entity/data,  event/activity  and  assertion. 

In  order  to  improve  the  chances  of  obtaining  correct  and  complete  specifications,  and  in 
agreement  with  principles  enunciated  in  SDM  [McLeod  78]  and  SADT  [Ross  77],  RML  also 
encourages  redundancy  in  the  specification  by  viewing  the  same  information  frbm  multiple 
viewpoints  (eg.,  entity,  events,  assertions). 

Second,  uniformity  is  felt  to  be  an  important  characteristic  of  a  language  which  is  easy  to 
learn  and  use.  For  this  reason,  RML  adopts  an  object-centered  view,  where  all  information 
is  recorded  in  terms  of  objects,  inter-related  by  properties  and  grouped  into  classes. 

Thirdly,  in  order  to  manage  large,  complex  descriptions,  RML  supports  a  structured 
organization  based  on  widely  used  abstraction  principles  [Smith  77],  [McLeod  81]: 
aggregation,  classification,  and  generalization.  Again,  it  applies  these  uniformly  to  all  three 
kinds  of  information  Structuring  activities  in  the  same  manner  as  data  provides  several 
advantages,  as  noted  in  [Borgida  81];  the  other  type  of  description  unit,  assertions, 
provide  (i)  the  expressive  power  of  logic,  and  (ii)  Vneans  of  structuring  complex  constraints 
using  the  same  abstraction  principles  as  for  entities  and  events,  since  assertions  are 
objects  themselves 

A  fourth  principle  is  to  make  it  easier  to  state  frequently  occurring  expressions  and 
constraints.  For  this  reason,  RML  incorporates  versions  of  many  popular  devices  present  in 
languages  such  as  SADT  [Ross  77]  and  PSL  [Teichroew  77],  In  addition,  RML  provides  a 
number  of  abbreviatory  devices,  including  the  ability  to  extend  the  language  by  adding  a 
restricted  class  of  new  abbreviations 

Finally,  in  order  to  facilitate  the  eventual  implementation  of  the  information  system,  RML 
was  designed  in  a  sense  as  a  "sibling"  of  the  Taxis  programming  language  [Mylopoulos  80]. 
As  such.  RML  shares  with  TAXIS  several  o f  the  above  fundamental  principles,  including  the 
use  of  classes/objects/properties  as  fundamental  building  stones,  and  the  emphasis  on  the 
uniform  support  of  the  abstraction  principles  for  both  data  and  transactions. 

Details  of  the  central  features  of  RML  appear  in  Section  2,  while  for  a  more  complete 
description,  the  reader  is  referred  to  [Greenspan  83]. 

In  Section  3,  we  argue  for  the  need  to  present  a  more  precise  specification  of  RML,  and 
proceed  to  do  so  using  a  logic  with  temporal  aspects  Finally,  in  Section  4,  we  present  an 
example  RML  specification  of  "time  concepts",  as  an  illustration  of  RML’s  capacity  to  define 
terms  other  than  parts  appearing  in  the  system.  We  have  relegated  to  an  appendix  a 
discussion  of  the  reasons  for  chosing  the  particular  formal  logic  used  in  the  paper,  as  well 
as  its  exact  definition. 

Our  work  cuts  across  several  areas  of  Computer  Science.  We  note  that  some  other 
researchers  have  advocated  the  use  of  conceptual  modeling  for  requirements  specifications, 
notably  Yeh[80]  This  kind  of  modeling  is  related  to  information  system  (or  database) 
modeling,  as  in  the  work  of  Bubenko[81],  Roussopoulos  [79],  and  Solvberg[79],  among 
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others,  as  well  as  the  work  on  conceptual  data  models.  The  work  on  Knowledge 
Representation  techniques  in  Artificial  Intelligence,  especially  "semantic  networks",  as  well  as 
object-centered  languages  such  as  Simula  and  Smalltalk  have  also  influenced  our  work.  A 
more  detailed  discussion  of  these  relationships  can  be  found  in  [Greenspan  82,83]. 


2.  Introduction  to  RML 

2.1.  Some  aspects  of  modeling  entities  in  RML 
Entities  are  used  to  model  the  objects  of  the  world,  such  as  persons,  places,  and  things. 
The  chief  characteristics  of  such  objects  is  that  they  can  be  meaningfully  said  to  exist  or 
not  at  any  moment  of  time.  Entities  are  inter-related  through  properties  and  are  organized 
into  classes,  whose  instances  (called  tokens )  share  common  properties.  At  any  moment  of 
time  a  class  has  some  set  of  instances,  which  are  generally  intended  to  model  objects 

existing  in  the  world  at  that  time.  The  specification  of  entity  classes  resembles  in  many 
ways  the  descriptions  current  in  semantic  data  models  (e.g,  [McLeod  78],  [Mylopoulos 
80]). 

The  RML  specifier  defines  a  class  in  part  by  describing  the  properties  which  will  be 
applicable  to  the  instances  of  the  class.  For  example, 

entity  class  PERSONS 
necessary  parts 

sex:  {male,  female} 
associations 

current _ spouse:  PERSONS 

address  ADDRESSES 

In  addition  to  giving  the  name  of  the  property,  the  definition  of  a  property  for  a  class 
introduces  one  or  more  constraints  on  the  values  of  this  property  as  a  function.  Two 
mechanisms  are  illustrated  in  the  above  example: 

•  A  range  of  values  is  specified  as  the  class  name  following  the  colon;  it 
indicates  that  the  value  of  this  property,  if  it  exists,  is  an  instance  of  that 

class,  and  hence  can  be  qualified  by  the  attributes  defined  for  that  class.  If  a 
property  does  not  have  a  value  for  an  object,  it  will  be  assigned  the  special 
value  "null". 

•  One  or  more  property  categories  —  appearing  in  bold  as  prefixes  to  lists  of 
properties  —  describing  in  more  detail  the  functionality  of  the  attribute. 

In  the  above  example,  the  range  of  "current spouse"  is  specified  to  be  the  class 

PERSONS,  and  the  association  property  category  specifies  that  the  spouse  of  a  person 
may  change  over  time  (unlike  say  the  mother  of  a  person)  The  property  category 
necessary  requires  that  such  properties  have  non-null  values  at  all  times. 


This  does  not  however  preclude  the  definition  of  a  class  such  as  PERSON'S WHO EVER EXISTED, 

whose  instances  at  any  point  may  be  objects  that  do  not  actually  occur  in  the  world  at  that  time. 

3 

'"Note  that  some  classes  define  sets  of  values  such  as  integers  or  string. 
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The  catalogue  of  property  categories  offered  by  RML  originally  contains  terms  such  as 
association,  input,  control,  necessary,  culled  from  the  experience  with  previous 
specification  languages  such  as  SADT  and  PSL/PSA.  The  user  of  the  language  may  add  to 
this  catalogue  by  defining  new  property  categories  and  their  meaning,  although  this  is  not 
expected  to  be  normal  modus  operandi. 

Note  that  classes  can  also  be  organized  into  metaclasses,  which  may  have  associated 

property  definitions  describing  their  instances.  For  example,  PERSON _ CLASS  might  have 

as  instances  CHILDREN,  PATIENTS,  etc.  and  each  might  have  a  property  "average _ age". 

Classes  are  also  organized  into  a  specialization  hierarchy,  where  the  instances  of  the 
subclass  are  always  to  be  instances  of  the  superclass  (eg,  the  PATIENTS  subclass  of 
PERSONS)  One  important  consequence  of  this  organization  is  that  properties  can  be 
inherited  down  the  hierarchy;  for  example,  since  a  patient  is  also  a  person,  every  property 
definition  of  PERSONS  is  also  a  property  definition  of  PATIENTS  (a  patient  also  has  a 
spouse,  etc.),  although  subclasses  may  have  additional  constraints  on  these  attributes,  and 
may  have  other  attributes  applicable  to  them. 


2.2.  Event  objects  in  RML 

Event  classes  are  intended  to  capture  information  about  events  in  the  world  and  are 
meant  to  have  as  instances  at  any  time  events  which  are  taking  place  then.  We  view  event 
tokens  as  having  a  single  occurrence,  over  a  continuous  period  of  time,  and  model 
repetitive  or  intermittent  events  as  event  classes  which  are  instances  of  metaclasses. 

Events  are  related  by  properties  to  other  events,  eg  parts  which  must  occur  as  part  of 
this  occurrence  of  an  event.  Events  are  also  related  to  entities  participating  in  them  by 
properties  with  entity  values.  Some  property  categories  specific  to  events  are  Input, 
Output  and  Control,  whose  meaning  is  derived  from  the  use  of  these  terms  in  SADT 
([Ross  77]). 

event  class  ADMIT _ PATIENT _ TO _ HOSPITAL  with 

necessary  input 
whoPERSON 
wardWARD 
doc:DOCTOR 
output 

patient  PATIENT 

part 

alOBTAIN _ INFO/from=who  of  this! 

a2. ASSIGN _ BED/to _ whom=who  of  this! 

In  order  to  fully  specify  the  relation  of  an  event  to  component  or  other  events,  we  often 
have  to  relate  their  participants  This  recalls  parameter  binding,  but  in  our  case  is  just  a 
way  of  stating  equality  constraints  on  the  attributes  of  property  values  In  our  example,  if  e 
is  an  instance  of  the  above  defined  class,  then  the  "al"  attribute  of  e  (denoted  for  short 
by  "a  1  of  e"  here)  must  have  its  "from"  property  value  equal  to  "who  of  e” 
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Such  ''binding''  constraints  are  useful  for  entity  objects  also.  For  example,  if  the  class 
PERSON  had  the  property  definition 

current _ spouse  PERSON/ current _ spouse:?/7/s/ 

then  current _ spouse  would  be  symmetric,  with  the  convention  that  "this"  refers  to  the 

instance  of  the  class  being  defined. 

As  with  entity  classes,  events  can  be  organized  in  a  specialization  hierarchy,  with  identical 
rules  for  proper  specialization. 

event  class  ADMIT _ FOR _ SURGERY  is-a  ADMIT _ PATIENT _ T 0 _ HOSPITAL  with 

output 

patient:  SURGICAL _ PATIENT; 

part 

a3:0RDER _ DIET/for=who  of  this/ 

We  also  remark  that,  symmetrically,  events  can  be  associated  with  entities  through 
properties  For  example,  PATIENTS  may  have  additional  properties: 

terminators 

cure:  RELEASE _ PATIENT/who=?/?/s/  * 

kill:  DIE/who=f/?/s/ 


2.3.  Assertions  in  RML 

We  have  argued  in  the  introduction  that  constraints  are  indispensable  if  we  are  to  provide 
proper  specifications.  Some  commonly  used  constraints  of  restricted  form  have  been  buiit 
into  the  RML  notation  and  principles  through  such  things  as  ranges  for  properties,  property 
categories,  binding  and  rules  about  specialization  hierarchies. 

In  order  to  deal  with  more  general  constraints,  RML  also  provides  a  many-sorted  logical 
language  with  identity.  In  this  language  we  must  explicitly  recognize  that  an  RML 
description  provides  a  "longitudinal’  view  of  the  world  through  time,  so  one  of  the  sorts  is 

that  of  time  points  which  are  assumed  to  be  linearly  ordered  in  an  infinite  sequence.4 
RML  assertions  may  involve  the  following  functions  and  predicates  involving  time: 

•  time  comparators  =,  <  and  <  ; 

•  functions  start  and  stop  specifying  the  start  and  end  time  of  an  event  token; 

•  3-place  predicate  in(x,y,s)  indicating  whether  x  is  an  instance  of  y  at  time  s; 
using  infix  notation,  this  is  written  as  (x  in  y  at  s). 

•  3-place  function  pv(x,y,s)  returning  the  value  of  property  y  of  object  x  at 
time  s,  the  infix  notation  for  this  is  (y  of  x  at  s). 

In  addition.  RML  assertions  subsume  the  usual  notations  for  the  logic  of  arithmetic  and 
strings,  and  may  involve  the  special  constant  $  (null). 


4 

Note  that  these  time  points  are  abstract  mathematical  concepts,  not  calendar  times;  the  latter  are 
exp'ess’ble  in  RML,  as  illustrated  in  Section  4. 
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Assertions  can  be  associated  with  instances  of  a  class  or  metaclass  through  properties 

For  example,  we  could  define  property  "right place"  for  ADMIT PATIENT TO HOSPITAL 

by 

right _ place:  ward  cf  this  at  slopithis)  = 

location  of  (who  of  this  at  stop  (this))  at  sXopithis) 

which  asserts  that  at  the  end  of  an  admission  the  location  of  the  patient  is  on  the  desired 
ward.  In  order  to  simplify  the  form  of  the  assertions,  we  adopt  two  rules: 

•  whenever  the  expression  "at  ...”  is  omitted,  it  is  assumed  to  be  by  default  "at 
now”,  where  now  is  a  special  reserved  variable, 

•  property  categories  can  restrict  the  range  of  values  of  now  in  a  constraint; 
otherwise,  now,  like  all  other  free  variables,  is  assumed  to  be  universally 
quantified. 

Thus,  the  above  constraint  could  have  been  stated  as 

final-condition 

right _ place:  ward  of  this  =  location  of  who  of  this 

since  the  "final  condition"  property  category  makes  now  be  the  stoping  time  of  the 
activity. 

As  with  entities  and  events,  we  contend  that  in  cases  where  the  requirements  become 
very  large,  there  will  be  a  need  to  organize  the  assertions  themselves.  The  organization  of 
assertions  provides  an  index  whereby  the  specifier  can  1)  check  if  closely  related 
assertions  have  been  used  before  and  thus  specify  new  assertions  incrementally,  2)  if  some 
assertion  is  to  be  modified,  obtain  an  indication  of  what  other  assertions  one  might  have  to 
reconsider  in  light  of  this  change. 

Since  uniformity  is  one  of  our  guiding  principles  in  designing  RML.  we  have  chosen  to 
aiso  model  assertions  as  objects  organized  into  classes  In  this  case,  an  assertion  class  is 
to  be  interpreted  as  a  predicate  declaration.  In  analogy  with  entity  and  event  classes,  an 
assertion  class  will  have  zero  or  more  attributes,  which  in  this  case  include  the  free 
variables  (arguments)  of  the  predicate,  and  these  variables  will  be  typed  by  the  usual 
method  of  property  definitions.  The  instances  of  an  assertion  class  represent  those 
propositions  obtained  by  instantiating  the  free  variables  of  the  predicate  which  are  true  at 
that  moment. 

For  example,  the  assertion  IS TREATED WITH,  describing  what  treatments  a  patient 

receives,  might  be  modeled  by  the  class 

assertion  class  IS _ TREATED _ WITH  with 

arguments 

p  PATIENTS; 
t:  TREATMENTS, 

and  at  any  moment  of  time,  this  class  is  presumed  to  contain  tokens  which  have  as  pairs 
of  attributes  a  patient  p  who  is  receiving  treatment  t. 


Sometimes,  we  will  want  to  relate  this  assertion  with  others,  as  was  done  for  events  and 
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entities  Properties  with  assertion  values  can  be  used  for  this  purpose,  with  the  continued 
understanding  that  if  at  any  moment  an  attribute  value  is  not  null,  the  corresponding 
assertion  is  true  For  example,  we  could  expand  the  description  of  IS _ TREATED _ WITH  as: 

assertion  class  IS _ TREATED _ WITH  with 

arguments 

p:  PATIENTS; 
t  TREATMENTS, 
necessary 

cl:  AVAILABLE/treatment=t,  place=hospital  of  p / 

c2:  RECOMMENDED/treatment=t,  disease=diagnosis  of  p / 

Once  again,  property  categories  are  used  to  impose  a  certain  interpretations  on  the 
properties  Thus,  the  cl  attribute  value  cannot  be  null,  and  hence  for  any  instance  of 

IS _ TREATED _ WITH  there  must  be  an  instance  of  AVAILABLE  with  the  appropriate 

properties. 

As  with  other  classes,  we  can  now  organize  assertions  into  IS-A  hierarchies.  To  be 
consistent  with  the  general  rules  relating  the  instances  of  sub  and  super  classes,  we  must 
have  that  if  P  is  an  assertion  subclass  of  Q.  then  P  must  be  more  "restrictive"  than  Q.  and 
all  instances  of  P  must  be  instances  of  Q.  The  natural  conclusion  is  that  P  is~a  Q  only  if  P 
implies  Q  logically.  Note  however  that  we  do  not  intend  the  converse  to  hold,  namely  that 
P  =>  Q  will  necessarily  result  in  P  is-a  Q 

Specialization  proceeds  with  assertion  classes  in  the  same  manner  as  with  event  classes 
Thus,  we  could  define 

assertion  class  RECEIVES _ CHEMOTHERAPY  is-a  IS _ TREATED _ WITH  with 

arguments 

p  CANCER _ PATIENTS, 

t  CANCER _ DRUGS; 

necessary 

c3  RECEIVES _ PSYCHOTHERAPY/p=p/ 

Thus,  receiving  chemotherapy  is  restricted  to  subclasses  of  patients  and  treatments,  and  has 
the  additional  condition  of  requiring  concurrent  psychotherapy. 

Hierarchies  of  assertion  classes  can  be  useful  to  define  groups  of  related  and  potentially 
ambiguous  terms;  the  organization  allows  the  similarities  and  differences  to  be  viewed.  For 
example,  one  often  needs  terms  for  expressing  temporal  relationships  between  events  and 
their  interval  of  occurrence,  yet  these  terms  are  notoriously  imprecise  in  English.  In  RML, 
we  can  define  a  hierarchy  of  assertion  classes  with  two  arguments  of  type  ANY-EVENT. 
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The  hierarchy  of  names  provides  a  rough  guide  to  the  intended  meaning  of  these  terms; 
for  example,  the  term  EARLIER  is  intended  to  relate  an  event  which  finishes  before  the 
other  one  starts,  as  indicated  by  its  position  under  the  class  DOESN’T  OVERLAP.  The 
meaning  of  these  constraints  can  be  clarified  by  adding  appropriate  assertions  about  start 
and  stop  times.  For  example,  OCCURS-DURING  specifies  that  the  interval  of  the  second 
event  contains  the  interval  of  the  first  one,  while  CO-STARTS  has  the  additional  condition 
that  the  two  events  start  at  the  same  time  The  important  point  is  that  these  definitions 
appear  now  as  an  integral  part  of  the  requirements  specification,  and  that  assertion  classes 
can  be  used  to  structure  and  organize  related  definitions.  This  organizational  ability  is  useful 
in  light  of  the  well  known  problems  of  gathering  and  modifying  large  amounts  of  details 
during  requirements. 


3.  A  definition  of  the  semantics  of  RML 

There  are  several  reasons  that  compel  us  to  present  a  more  formal  account  of  the 
meaning  of  RML.  To  begin  with,  the  process  of  formalization  itself  forces  us  to  consider  in 
detail  the  precise  meaning  of  the  constructs,  and  often  leads  to  the  discovery  of 
ambiguities  and  asymmetries 

Second,  given  a  requirements  specification,  there  is  a  range  of  questions  that  one  often 
wishes  to  have  answered,  including; 

•  Is  the  specification  consistent  or  self-contradictory? 

•  Can  a  certain  situation  arise  according  to  the  specification? 

A  formal  semantics  provides  a  domain  where  the  definition  of  such  terms  as  "consistency” 
can  be  given,  and  sets  a  standard  against  which  to  judge  the  correctness  of  any  computer 
tools  that  purport  to  assist  the  user  in  answering  the  above  questions 

Finally,  the  users  and  readers  of  the  language  can  use  the  formal  specification  as  the  final 
arbiter  in  cases  where  there  is  disagreement  concerning  the  exact  meaning  of  some 
specification. 


For  reasons  more  fully  explained  in  the  appendix,  we  have  chosen  to  express  the 
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semantics  of  RML  by  providing  a  method  to  translate  an  arbitrary  RML  specification  into  a 
set  of  formulae  in  a  Predicate  Calculus  This  provides  us  with  notions  such  as  "consistency’' 
and  "answer  to  a  question".  Time  must  obviously  play  an  important  role  in  our  logical 
calculus,  and  we  have  chosen  to  incorporate  it  by  adding  time  arguments  to  most 
predicates  and  functions  We  have  eschewed  more  complex  temporal  logics  and -non- linear 
time-lines,  because  of  the  relative  simplicity  of  our  approach,  its  expressive  power,  and 

5 

our  notion  of  the  model  as  a  look  back  at  the  world 


3.1.  The  logical  language  with  time 

The  logic  to  be  presented  will  in  fact  extend  the  language  of  RML  assertions  outlined  in 
section  2.3,  through  the  addition  of  new  predicates  and  functions,  but  we  will  define  it  in 
its  entirety  here. 


The  following  are  the  basic  predicates  and  functions  which  will  be  used  to  explain  the 
meaning  of  various  constructs  in  RML. 

in(k,K,s)  predicate  asserting  that  token  k  is  an  instance  of  class  K  (or  class  k 

is  in  metaclass  K)  at  time  s. 


defp(K.p)  time-independent  function  which  gives  the  class  or  metaclass  to  which 

must  belong  the  value  of  attribute  p  for  instances  of  K;  (the 
definitional  property  function) 

pv(k,p,s)  function  which  gives  the  value  of  the  attribute  p  of  element  k  at  time 

s;  (the  factual  property  function) 

start(e)  function  providing  the  start  time  of  event  token  e; 


stop(e)  function  providing  the  end  time  of  event  token  e; 

is-a(K1.K2)  time-independent  predicate  asserting  that  class  (metaclass)  K1  is  a 

subclass  of  class  (metaclass)  K2, 


occurs(k,K,s  1  ,s2) 


< 

-r 

$ 


< 

r 


T 


non-primitive  predicate  used  to  specify  the  maximal  time  interval(s)  si 
to  s2  when  k  is  an  instance  of  K.  definable  in  terms  of  start  and  stop 
for  events,  and  more  generally  using  the  predicate  in(); 

predicates  to  determine  if  two  objects  are  identical  or  not; 

predicates  to  compare  time  points; 

a  special  constant  used  to  denote  the  null  value  (meaning  "no  value"); 


We  will  also  feel  free  to  use  the  symbols  relating  to  the  calculus  of  numbers  and  strings, 
though  we  will  not  mention  these  explicitly  in  the  future. 

In  writing  assertions,  we  will  assume  that  all  free  variables  are  universally  quantified,  and 
whenever  a  symbol  could  be  either  a  constant  or  a  variable,  it  will  be  a  variable  unless 


For  further  discussion,  see  the  Appendix. 
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expressly  stated  that  it  is  a  constant 

I 

In  order  to  avoid  problems  with  non-sensical  assertions,  the  logical  language  is  made 
many-sorted^ ,  so  that  one  can  specify,  for  example,  that  the  third  argument  of  in()  is  of 
sort  time,  and  hence  variables  or  constants  of  sort  token  or  property  cannot-  be  used 
there.  In  the  discussion  to  follow  we  will  omit  the  sort-information,  since  it  can  be  easily 
inferred  from  the  context 

3.2.  The  basic  axioms  of  RML 

First  we  give  some  obvious  axioms  describing  our  notion  of  time.  These  may  involve  two 
special  time  constants:  and  -®.  used  to  mark  the  ends  of  time  ("creation’'  and  "judgment 

days"),  so  we  can  represent  events  that  last  forever.  The  following  axioms  present  the 


desired  properties  of  time. 

(si  <  s2)  a  (s2  <  s3) 

=>  (s  1  < 

s3) 

{<  is  transitive} 

(si  <  s2)  a  (S2  <  si) 

==>  (s  1  = 

s2) 

{<  is  anti-symmetirc} 

(si  <  s2)  v  (S2  <  si) 

{<  is 

a  total  order} 

(si  <  s2)  <=>  (si  < 

s2)  a  ~(s  1 

=  s2) 

{definition  of  <  } 

si  <  s2  =>  (3  s)  s1<s  a  s<s2  {time  is  dense} 

(-m  <  s)  a  (s  <  +«*) 

Next,  we  provide  the  central  axioms  of  RML  describing  the  way  in  which  definitional  and 
factual  properties  relate  to  instances  and  subclasses  of  classes 

Nut / -value  axioms,  {base  predicates  on  $  are  false,  and  functions  on  S  produce  S} 

~  in(S,K,s)  a  ~  in(k,S,s) 

pv(S,p,s)=S  a  defp(S,p)=S  a  start(S)=S  a  stop(S)=$ 

Property-value  constraint:  {Property  values  must  belong  to  the  range  specified 

by  the  definitional  property} 

in(d.C,s)  a  pv(d,p,s)=y  a  y~=S  =>  3D  defp(C,p)=D  a  in(y,D,s) 

Extensional  IS-A  constraint:  {Each  instance  of  a  subclass  is  in  its  superclass} 
in(d,D,s)  a  is-a(D,C)  =>  in(d,C,s) 


See  the  appendix  for  precise  definitions 
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Intensions/  IS-A  constraint:  {Inherited  attributes  have  more  specialized  ranges} 


defp(C.p)=E  a  E~=S  a  is-a(D.C)  => 

3F  defp(D.p)=F  a  F~=s  a  is-a(E.F) 

Definition  of  occur! ):  {maximal  periods  during  which  an  object  is  in  a  class^ 
occur(e,E.s  1,s2)  =>  s1<s2 

(occur(e,E,s1,s2)  a  occur(e,E,s3,s4))  ==>  s2<s3  v  s4<s1 

occur(e,E,s  1,s2)  a  s1<s  a  s<s2  =>  in(e,E,s) 

in(e.E,s)  ==>  (3  s1,s2)  occur(e,E,s  1,s2)  a  sl<s  a  s<s2 


3.3.  Axioms  for  general  class  definitions 

The  definition  of  a  class  introduces  new  symbols  into  the  language,  as  well  as  a  number 
of  axioms  In  this  section  we  consider  those  aspects  of  the  semantics  of  a  class/metaclass 
definition  which  are  independent  of  the  kind  of  class  being  defined  (entity/activity/assertion). 

To  begin  with,  property  categories  other  than  arguments  will  be  represented  by 
predicates  on  properties,  and  their  meaning  will  be  defined  by  axiom  schemata  The 
property  category  states  constraints  about  property  values  and  the  times  when  these 
constraints  must  hold  For  this  reason,  their  definition  must  anticipate  the  possibility  of 

binding  constraints"  being  associated  with  the  property  definition. 

For  example,  the  property  category  necessary  would  be  defined  by 

NECESSARY(p)  =  defp(x,p)=y  a  y~=$  =>  [in(z.x.now)  =>  pv(z,p.now)~=$  a  x] 
while  initial -condition  could  be 

INITIAL-CONDITIONip)  =  defp(x,pi=y  a  y~=S  =>[occunz.x,now,t»  =>  pv(z,p,now)~=$  a  \] 
The  slots  marked  by  x  will  be  filled  by  formulas  derived  from  binding  constraints  in  a 
manner  to  be  described  below. 

The  other  property  categories  are  similarly  defined,  and  a  designer  who  wishes  to 
introduce  a  new  property  category,  must  define  a  corresponding  predicate. 

Consider  now  the  definition  of  an  arbitrary  class,  which  has  the  general  syntax: 

<class  B>  in  <metaclass  M>  is— a  <class  C>  with 

<property-category  n> 

•  •  • 

<property  p>  :  <range  D>  /  <binding  assertion  / 


As  a  result  of  this  definition,  we  must  add  B  to  the  list  of  class  constants,  p  to  the  list 
of  property  constants  (if  not  already  there)  and  add  the  following  axioms7 


7 

Here  the  only  variables  will  be  the  ones  dealing  with  time  states 
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is-a(B,C)  {B  is  a  subclass  of  C} 

in(B,M,s)  {B  is  an  instance  of  M} 

defp(B.p)  =  D  {values  of  p  belong  to  D} 

7r(p)  with  x  replaced  by  Expand  (\p)  {the  values  of  p  satisfy  the 

property-category  constraints} 

Expandr  is  a  function  which  takes  the  binding  formula  4/  ,  consisting  of  one  or  more 
equalities  of  the  form 

gn  ...  of  gl  =  qm  ...  of  ql  of  c 

rewrites  them  into  the  form 

gn  ...  of  gl  of  p  of  this  =  qm  ...  of  ql  of  c 
then  expands  the  abbreviation  which  infers  the  time  when  a  predicate  or  function  is  to  be 
evaluated  by  introducing  the  default  time  now 

pv(gn,  pv(...  pv(g1,  pv(p, this, now),  now)...), now)  =  pv(qm,  pv(...  p\j(q\ ,c,now)...),now) 
and  finally,  produces  a  single  formula  by  conjoining*  these  identity  assertions. 


3.4.  Axioms  for  entity  classes 

We  begin  with  the  axioms  defining  some  of  the  constants  and  permanent  objects  which 
can  be  used  to  build  descriptions  in  RML,  including  ANY-ENTITY.  ANY-ENTITY-CLASS. 
NUMBER  and  STRING. 

in(d.D.s)  =>  in(d. ANY-ENTITY, s) 

in(D.M.s)  =>  in(D, ANY-ENTITY- CLASS, s)8 

is-a(NUMBER.  ANY-ENTITY)  a  is~a(STRING, ANY- ENTITY) 

We  will  also  need  axioms  asserting  the  existence  of  all  the  numbers  as  instances  of 
NUMBER,  all  strings  as  instances  of  STRING,  and  defining  the  usual  arithmetic  and  string 

g 

operations  and  predicates.  These  will  be  omitted  here  for  brevity 

The  following  two  additional  axioms  will  ensure  that  numbers  and  strings  are  distinct  from 
all  other  entities. 

in(n,NUMBER,s)  a  in(n.K.s)  =>  is-a(K,NUMBER)  v  is-a(NUMBER.K) 
in(z,STRING,s)  a  in(z,K.s)  =>  is-a(K,STRING)  v  is-a(STRING,K) 

In  a  class  definition  such  as  the  one  illustrated  in  Section  2,  designers  may  introduce  new 
entity  class  constants  "in-line''  by  using  Pascal-like  enumeration  or  range  expressions  for 
domains  D.  Their  semantics  are  the  obvious  ones: 


Similar  axioms  for  ANY-EVENT.  etc.  will  be  omited  here. 

9 

Note  that  if  RML  is  to  incorporate  the  foil  semantics  of  set-theoretic  concepts  such  as  cardinality,  average, 
etc.,  we  will  need  non-First  Oraer  constructs  in  our  axioms 
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If  D  is  an  enumerated  set  specifier  of  the  form  {k1,k2 .  kn},  we  must  create  constant 

symbols  kl,...  kn,  unless  they  exist  already,  and  the  axioms 
in(k  I.D.s)  a  ...  a  in(kn,D.s) 
inlk.D.s'  =>  k=k1  v  ...  v  k=kn 
if  kl,...  are  integers  then 
is-a(D, NUMBER) 

(  in(k,D,s)  ==>  in(k,K,s)  )  =>  is-a(D,K) 

(  in(k,K,s)  =>  in(k,D,s)  )  ==>  is-a(K,D) 

If  D's  definition  was  of  the  form  {i1..i2},  the  first  three  axioms  become 

is-a(D,NUMBER) 
in(n,D,s)  <=>  i1<n<i2 


3.5.  Axioms  for  events 

The  following  axioms  deal  with  the  semantics  of  event  occurrences. 

Events  progress  in  time:  startle)  <  stop(e) 

Event  tokens  occur  only  once: 

in(e,E,s)  =>  start(e)<s  a  s<stop(e) 

(3  E)  Vs  start(e)<s  a  s<stop(e)  =>  in(e,E,s) 

To  model  events  with  multiple  occurences  leg,  going  to  school  with  interuptions).  event 
classes  have  their  own  occurence  defined  by  the  occurence  of  their  instances: 
occur(E,REPETETIVE-EVENT-CLASS,s1,s2)  <=*  3e,s.s1,s2  occur(e,E,s  1.s2) 


3.6.  Axioms  for  assertions 

Every  time  we  encounter  the  definition  of  an  assertion  class  A,  with  n  properties  pi....  in 
the  arguments  category,  we  define  a  new  (n+1)-ary  predicate  A,  the  last  place  of  sort 
time,  and  add  axioms  specifying  that  these  property  values  must  be  known,  their 
combination  unique,  and  that  the  predicate  holds  if  and  only  if  there  is  a  corresponding 
token  in  the  class 

y  ~=  z  a  in(y,A,s)  a  inlz.A.s)  ==>  ~(pv(y,p  1  ,s)=pv(z,p  1  ,s)  a  ...  a  pv(y,pn,s)=pv(z,pn,s) 
in(y,A,s)  =>  pv(y,p  1  ,s)~=S  a  ...  a  pv(y,pn,  1  )~=S 
in(a,A,s)  <=>  A(pv(p1,a,s) .  pv(pn.a,s),s) 

Suppose  that  the  property  definition  of  p  for  some  class  specifies  an  RML  assertion  £ 
instead  of  a  domain  class  D.  First,  we  assume  that  the  abbreviation  introduced  by  dropping 
the  time  argument  of  predicates  and  functions  is  expanded  by  the  parsing  process  If  £  is 

a  closed  formula  then  we  must  add  a  new  assertion  class  constant  (as  with  in-line 
entity  class  definitions),  and  replace  the  property  definition  by 

p  : 

Finally,  we  define  the  appropriate  instances  for  A^  by  adding  the  axiom 
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i  <=>  (3  y)  in(y,  A^.  now) 

If  l  contains  occurrences  of  the  special  free  variable  this  (it  should  have  no  other  free 

variables),  A^  must  be  defined  with  one  argument  <rANY,  and  the  two  definitions  -above  will 
change  to 

p  :  A^/a=this! 
and 

in(x,  A^  ,now)  a  pv(x,a,/?ow)=y  <=>  £<y/this> 


4.  Calendar  or  clock  time:  an  example  specification 
As  an  illustration  of  the  capabilities  of  RML,  and  as  an  obviously  useful  part  of  many 
specifications,  we  present  here  a  model  of  the  familiar  calendar  time.  It  is  based  on  the 
observation  [Bubenko  80]  that  time  periods  (SECONDS,  MINUTES,  HOURS,  DAYS,  MONTHS, 
YEARS,  WEEKS),  can  be  viewed  as  events  of  ^different  kinds,  which  are  appropriately 
related  to  time  points 

One  way  to  model  HOURS  is  to  first  capture  in  a  meta-class  TIME-EVENTS  the 
characteristics  common  to  all  the  time  periods,  such  as  MINUTES,  HOURS,  etc.  It  is  the 
case  that  at  any  moment  of  time  there  is  at  least  one  event  of  each  such  kind  going  on; 
this  is  captured  by  the  assertion  property  ’'eternal".  In  fact,  except  for  the  end  points,  there 

is  only  one  such  event  occuring  at  any  time  ("up-to-two".  "unique-except-at-ends"). 10 

event  metaclass  TIME-EVENT  is~a  ANY-EVENT-CLASS  with 
necessary 

eternal  3t  (t  in  this) 

up-to-two:  (x  in  this)  and  (y  in  this)  and  (z  in  this)  =>  z=x  v  z=y 
umque-except-at-ends  (x  in  this)  and  (y  in  this)  =>  x=y  v 

start(x)=stop(y)  v  start(y)=stop(x) 

An  hour  is  then  specified  to  have  an  associated  clock  name  from  1  to  24,  and  60  minute 
sub-events  (which  also  have  clock  names).  One  of  these  minutes  must  be  occurring  at  any 
moment  ("exhaustive"),  and  the  first  and  last  minute  must  start  and  end  the  hour  ("starting- 
minute”,  "ending-minute").  Finally,  we  need  a  constraint  ("sequence”)  ensuring  that  hours 
succeed  each  other  with  appropriate  clock  numbers.  (A  similar  axiom  on  MINUTES  ensures 
that  the  60  minutes  in  the  hour  follow  each  other.) 


For  technical  reasons  discussed  in  [McDermott  81],  we  should  also  include  a  a  “comprehension  axiom" 
which  ensures  that  time  periods  don't  "shrink"  inf mitessimally  small.  We  omit  this  here  for  the  sake  of  brevity. 
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event  HOURS  in  TIME-EVENTS  is-a  ANY-EVENT  with 
parts 

ml:  MINUTES/clock=  1/ 

m60:  MINUTES/clock=60/ 
necessary  part 

clock:  { 1  ..24} ; 

exhaustive:  (m1~=S  or  m2~=S  ...  or  m60~=$) 

sequence:  (x  in  HOURS  at  s)  and  stop(f/7/'s)=start(x)  => 

(clock  of  x  at  s)  =  mod(  clock  of  this,  24)  +  1 

initial-condition 

starting-minute:  CO-STARTS(m  1  .this) 
final-condition 

ending-minute  CO-FINISHES(m60T/?/s) 
end  HOUR; 

Similar  definitions  to  that  of  HOUR  would  be  given  for  the  other  calendar  time  units. 
Observe  that  we  had  made  class  properties  out  of  common  constraints  on  each  of  the 
time  units,  and  these  are  automatically  imposed  by  our  "factual  property  induction"  axiom  to 
all  instances  of  TIME-EVENT.  The  other  constraints  for  HOUR  appear  to  be  specific  to  it, 
and  hence  have  been  attached  to  each  hour. 

Having  defined  this,  if  we  now  wish  to  say  that  some  event  e  is  to  happen  only  in  the 
minute  after  12  noon,  the  following  assertion  could  be  used: 

e  in  ANY-EVENT  =>  3h  h  in  HOURS  and  clock  of  h  =  12  and  ml  of  h  ~=  null 


5.  Conclusions 

We  mention  here  two  additional  aspects  of  the  RML  approach  to  requirements 
specification  which  have  not  been  formally  treated  in  this  paper. 

One  advantage  of  the  "world  modeling"  approach  is  the  ability  to  state  important  non¬ 
functional  requirements  as  part  of  the  specification.  For  example,  for  a  naval  information 
system,  we  might  require  that  the  location  of  ships  be  known  correctly  within  100  miles: 
(y/SHIPS)  (2/ST0RED-SHIPS)  Denotes  (z,y)  => 

D i f f erence_i n_mi 1 es (locat i on  of  y,  stored_location  of  2)  <  100 
Such  a  constraint  can  be  construed  as  a  requirement  on  the  "accuracy"  of  the  system  and 
would  be  extremely  important  for  the  design  of  a  real-time  data-gathering  system 
incorporating  the  database  above.  Our  current  research  is  examining  ways  to  facilitate  the 
description  of  system  objects  corresponding  to  real-world  ones,  in  terms  of  assertions 
about  the  required  correspondence  between  the  two. 

Secondly,  it  would  appear  that  a  methodology  which  approaches  requirements 
specification  as  an  iterative  process  is  more  likely  to  lead  to  a  complete  and  accurate 
model  In  [Greenspan  83],  we  demonstrate  how  an  extension  of  the  SADT  notation  can 
be  used  to  build  up  an  initial  diagramatic  "road  map"  or  "skeleton”  around  which  an  RML 
specification  can  be  "fleshed  out”.  We  feel  that  this  is  an  important  practical  advantage 
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because 

•  SADT  is  a  widely  used  requirements  specification  notation  which  has  proven 
itself  in  many  practical  situations; 

•  SADT  provides  a  pictorial  notation  which  appears  to  be  appealing  to  both 
non-computer  experts  as  well  as  systems  analysts,  and  as  such  can  form  the 
basis  of  a  communication  channel  between  users  and  designers  —  an 
important  function  of  requirements,  as  noted  in  [Lum  78]. 

The  connection  between  SADT  and  RML  is  not  accidental  —  it  was  planned  for  from  the 
beginning  —  and  was  facilitated  by  SADT’s  ability  to  describe  things  other  than  just 
computer  systems,  and  its  uniform  dual  treatment  of  data  and  activities. 

In  summary,  RML  is  a  language  which  was  crafted  with  the  intention  of  providing  users 
with  a  tool  for  stating  requirements  in  the  form  of  world  models.  Our  intent  is  to  facilitate 
the  task  of  stating  precise  and  consistent  specifications  by  providing 
organizational/abstraction  principles,  which  structure  large  specifications  and  make  them 
easier  to  construct  and  modify,  as  well  as  useful*  abbreviatory  devices.  Furthermore,  by 
matching  the  features  of  RML  with  those  of  the  Taxis  programing  language,  we  hope  to 
further  facilitate  the  implementation  of  useful  information  systems. 

Since  RML  has  quite  a  number  of  novel  features,  especially  relating  to  the  manner  in 
which  constraints,  hitherto  expressed  in  English,  can  be  precisely  stated,  we  have  presented 
a  formal  explanation  of  these  features.  We  have  done  so  by  giving  translation  procedures 
from  RML  to  a  logic  with  temporal  indices,  and  in  this  sense  any  RML  specification  is 
equivalent  to  some  set  of  logical  axioms.  Hopefully,  though.  RML  makes  it  easier  to  gather, 
express  and  maintain  such  a  set  of  axioms,  especially  when  it  gets  large  and  complex. 

Concerning  the  axiomatization  of  RML,  we  note  that  much  of  what  needs  to  be  said 
about  entities,  events,  and  assertion  objects  can  be  stated  once  for  all  objects,  with  few 
additional  axioms  characterizing  the  three  object  types  This  supports  our  contention  that 
RML  treats  these  concepts  in  a  relatively  uniform  manner. 

The  work  presented  in  this  paper  is  similar  in  intent  to  that  of  Roussopoulos  [79] .  who 
provides  a  "frame-based"  approach  to  data  modeling  which  also  was  shown  equivalent  to  a 
first-order  logic.  RML  considers  in  addition  the  temporal  and  dynamic  aspects  of  the  world, 
the  uniform  application  of  abstraction  principles,  and  devices  to  abbreviate  descriptions. 

The  specification  technique  proposed  by  Bubenko  and  his  group  ([Bubenko  80.81], 
[Lundberg  82])  also  views  the  requirements  as  a  set  of  axioms  about  the  world,  and 
recognizes  the  importance  of  time  and  activities,  but  does  not  explicitly  address  the 
problem  of  structuring  such  a  set  of  axioms  (our  abstraction  principles)  nor  the  issue  of 
facilitating  their  expression. 

In  conclusion,  we  intend  our  work  to  be  attacking  part  of  the  problem  described  by 
[Guttag  82],  where  in  the  authors'  experience,  existing  specification  tools  were  found  to 
be  rigorously  formal,  or  else  usefully  structured,  but  not  both. 
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I.  Some  issues  in  the  selection  of  a  formal  semantics  for  RML 

To  begin  with,  let  us  remember  that  we  will  describe  the  semantics  of  RML  by  presenting 
a  method  for  translating  an  arbitrary  RML  specification  into  a  set  of  assertions  in  a 
Predicate  Calculus. 

It  appeared  advantageous  to  use  some  logic  as  the  language  in  which  the  meaning  of  RML 
is  expressed  because  (i)  "assertions"  in  some  form  are  already  part  of  RML,  (ii)  logics  come 
ready-made  with  notions  such  as  "consistency"  and  "deduction",  which  are  relevant  for  a 
requirements  language,  (iii)  there  has  been  considerable  work  on  automatic  deduction, 
especially  with  First  Order  Languages,  and  we  can  hope  to  incorporate  such  work  into 
future  (semi-)automated  tools  which  assist  users  in  developing  proper  requirements. 

The  choice  of  a  logical  language  was  motivated  by  additional  considerations  First,  RML  is 
a  specification  language,  whose  intended  use  __  is  to  place  constraints  on  the  potential 
systems  implementations,  with  minimal  possible  interference  with  the  task  of  the 
implementor;  in  RML  one  has  the  ability  to  be  as  precise  as  desired,  but  also  to  be  vague. 
This  is  in  contrast  with  languages  such  as  programming  languages,  which,  relatively  speaking, 
are  geared  to  uniformly  precise  specifications. 

A  second  important  characteristic  of  requirements  languages  is  that  they  are  almost 
always  used  to  model  or  describe  dynamic,  as  opposed  to  static,  enterprises.  This  makes 
the  notions  of  events  and  changes,  together  with  some  idea  of  time,  of  central  importance 
to  the  language. 

Accepting  time  or  change  as  a  basic  fact,  one  is  still  left  with  a  number  of  questions 
about  the  nature  of  objects  and  events  that  may  occur. 

Is  time  viewed  as  a  sequence  of  static  snapshots  so  that  one  can  always  ask  about  "the 
next  state"?  Tied  to  this  is  usually  the  existence  of  an  assumption  (a  general  "frame 
axiom”)  that  there  are  no  changes  in  the  domain  of  interest  unless  some  specified  event 
has  caused  them.  In  a  requirements  specification  language,  it  would  seem  appropriate  that 
the  designer  be  required  to  expend  extra  effort  in  order  to  state  such  a  condition  as  a 

constraint,  since  it  limits  the  possible  implementations.11  For  this  reason,  we  have  chosen 
not  to  use  one  of  the  traditional  programming  logics,  or  temporal  logics  for  programming 
systems  ([Ben-Ari  81],  [deCastilho  82]). 

We  have  chosen  to  avoid  more  complex  systems  such  as  temporal  or  tense  logics  (e.g., 
[Rescher  71]),  or  intensional  logics  (eg  [Montague  73]),  because  they  do  not  provide  any 
additional  expressive  power,  only  expressive  convenience;  since  this  logic  is  not  being  used 


At  the  very  least,  the  designer  should  be  aole  to  state  "default  rules"  selectively,  in  the  style  of  [Reiter 
80]  say.  rather  than  have  them  imposed  automatically. 
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by  end-users,  this  convenience  is  outweighed  by  the  complexity  of  these  logics,  including 
the  absence  of  significant  automatic  theorem  proving  capabilities  in  these  domains. 
However,  there  is  nothing  to  prevent  us  from  offering  the  abbreviations  proposed  in 
temporal  logics  (eg.,  a  default  NOW  time,  abbreviations  for  "P  until  Q",  etc.)  to  users  of 
RML 

The  traditional  way  of  extending  FOL  to  deal  with  change  is  through  the  situational 
calculus  ([McCarthy  68]),  where  an  extra  argument  is  added  to  all  time-dependent 
functions  and  predicates,  while  events  are  specified  through  predicates  on  the  situations 
they  "connect".  Within  this  general  framework,  several  alternatives  arise  as  to  the  nature  of 
the  "situations". 

The  weakest  assumption,  one  adopted  in  programming  logics  and  papers  like  [deCastilho 
82]  and  [McCarthy  68],  is  that  there  is  no  constraint  on  the  ordering  of  situations.  This, 
however,  is  obtained  by  modeling  events  as  "state  changes"  (sets  of  pairs  of  states).  This 
does  not  allow  one  to  talk  about  what  may  happen  while  an  event  is  occurring,  and  makes 
it  difficult  to  talk  about  durations  of  event  occurrences,  among  others. 

It  is  also  possible  to  consider  time  as  a  partial  order  of  time  points,  as  done  in 
[McDermott  81],  [McCawley  81],  and  tense  logics,  with  several  possible  futures  such  that 
each  is  a  total  order.  This  is  advantageous  for  planning  actions,  revising  beliefs  or 
representing  the  meaning  of  English  sentences  referring  to  future  possibility,  but  seems 
superfluous  baggage  for  our  purposes,  since  the  specification  can  be  taken  to  be  a  "God's 
eye-view”  of  the  situation,  and  we  are  not  concerned  with  providing  a  formal  system  in 
which  one  can  reason  on  how  to  achieve  certain  goals 

We  are  thus  led  to  a  metaphysically  linear  view  of  time  In  the  present  work,  we  have 
chosen  the  conservative  approach  of  using  time  points  as  indices,  because  it  is  relatively 
well  understood,  we  are  however  investigating  the  alternative  of  using  time  intervals  as 
primitives,  as  suggested  in  [Allen  81],  since  this  approach  avoids  some  thorny  problems 
about  what  happens  at  the  time  points  where  one  event  starts  immediately  when  another 
one  stops. 


5.1.  On  the  exact  nature  of  the  formal  language  used 
The  logical  language  which  we  use  for  expressing  the  semantics  of  RML  is  a  many  sorted 
first  order  language,  which  has 

•  a  non-empty  set  of  sorts  S,  which  in  our  case  are  not  assumed  to  be 
mutually  disjoint; 

•  a  potentially  infinite  collection  of  variable  symbols,  each  belonging  to  one  of 
the  sorts; 

•  for  each  n,  a  collection  of  n-ary  predicate  symbols,  each  associated  with  one 
or  more  sort  signatures,  which  are  n-ary  vectors  of  elements  in  S;  intuitively, 
each  signature  specifies  the  sort  of  the  corresponding  arguments  for  the 
predicate; 
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•  for  each  n,  a  collection  of  n-ary  function  symbols,  each  associated  with  one 
or  more  sort  signatures,  which  are  (n+ 1)-vectors  of  S,  and  specify  the  sort 
of  the  arguments  of  the  function  as  well  as  the  sort  of  the  function  value. 

•  the  standard  formation  rules  using  logical  symbols  a,  v  ,  -  ,  =>  ,  <=>  V,  and  3. 

««■ 

Traditionally,  the  sorts  are  supposed  to  be  distinct  (i.e.,  an  object  is  of  one  sort  only),  and 
predicates/functions  have  a  single  sort  signature.  We  have  chosen  this  modified  definition 
since  it  considerably  simplifies  our  task  of  description  and  because  the  construction  in 
[Enderton  72],  which  shows  the  equivalence  of  ordinary  many  sorted  logic  to  standard 
logic,  can  be  trivially  modified  to  allow  for  the  variations  used  here. 

We  extend  this  language  with  the  special  binary  predicate  =,  and  its  complement  ~=,  to 
introduce  the  notion  of  identity.  In  writing  assertions  we  will  assume  that  all  free  variables 
are  universally  quantified,  and  whenever  a  symbol  could  be  either  a  constant  or  a  variable, 
it  will  be  a  variable  unless  expressly  stated  that  it  is  a  constant 

In  order  to  deal  with  time,  we  assume  a  special  sort  T  whose  elements  are  linearly 
ordered  by  the  predicates  <T  (before)  and  =T  (equality  for  time).  For  convenience,  strict 
precedence  <T  for  time  is  defined  in  the  obvious  way  (Hereafter,  we  will  omit  the 
subscript  since  the  sort  of  the  equality  can  always  be  determined  from  the  context). 

The  remaining  objects  in  the  world  are  divided  into  four  sorts:  D,  E,  A  and  P, 
corresponding  to  entities  ("data"),  events,  assertions  and  properties  (attributes).  The  first 
three  each  have  subsorts:  Token,  Class  and  Metaclass,  so  in  fact  there  are  10  basic  sorts 
all  together.  It  will  also  be  convenient  to  have  sort  Element  representing  Token  u  Class. 

The  remainder  of  the  formalization  is  presented  in  Section  3,  but  with  the  sort 
information  left  implicit,  since  it  is  easily  deduceable  from  the  context 
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1.  Introduction 

Consider  an  envirbnrnent  (e.g.  a  hospital),  that  may  have  many  '’systems" 
operating  within  it,  some  of  which  are  computer-based  or  computer-assisted. 
There  could  be  a  variety  of  system  types;  for  instance,  some  might  be  interac¬ 
tive  information  systdms  (e.g.  a  patient  registration  system),  while  others 
might  be  real-time  (e.g.  a  patient  monitoring  system).  A  requirements 
specification  for  these  systems  must  capture  aspects  of  the  environment  which 
determine  each  systenri’s  operation.  Moreover,  the  specification  must  be  under¬ 
standable  by  its  author,  as  he  is  building  it,  and  others  who  may  wish  to  use  it, 
either  to  understand  how  a  hospital  operates  or  to  generate  another  computer 
system  that  handles  sobne  of  the  hospital  functions. 

The  development  of  languages,  methodologies  and  tools  for  requirements 
specifications  is  not  a  new  subject  matter  for  software  engineers  (e.g.  [TSE77] 
presents  a  representative  sample  of  projects  on  requirements  specifications). 
Unlike  other  efforts,  however,  we  propose  to  tackle  the  problem  from  the  point 
of  view  of  Artificial  Intelligence  (AI).  To  us,  a  requirements  specification  is 
above  all  a  knowledge  base  about  some  slice  of  reality.  Languages  for  require¬ 
ments  specifications  rriust  offer  facilities  for  the  representation  and  organiza¬ 
tion  of  knowledge  into  a  coherent  structure  that  is  understandable  by  its 
designer(s),  by  programmers  building  systems  based  on  a  given  requirements 
specification,  and,  ultilrmtely,  by  end  users  of  these  systems.  The  rest  of  this 
position  paper  outlines  a  knowledge  representation  framework  we  have  adapted 
for  the  Taxis  Project  at  the  University  of  Toronto  as  a  basis  for  a  requirements 
specification  language  called  RIAL;  also,  we  present  some  of  the  basic  features  of 
RML,  along  with  some  Open  research  questions  we  would  like  to  address  in  the 
future. 

2.  A  Knowledge  Representation  Framework 

We  are  working  with  an  object- criented  representational  framework,  thus 
sharing  some  common  premises  with  Simula  [Dahl72]  and  Smalltalk  [Gold- 
bergBl],  the  Entity  Relationship  Model  [Chen76],  also  semantic  networks  [Quil- 
lianBO]  and  frame-based  representation  languages  [Minsky75].  In  an  object- 
oriented  framework,  a  specification  consists  of  a  set  of  interrelated  objects. 
Each  object  is  intended  to  stand  for  some  entity  or  concept  in  the  world  or  sys¬ 
tem  being  modelled.  The  creation,  modification,  and  manipulation  of  objects  is 
taken  to  represent  the  behavior  of  their  counterparts  in  the  world.  A  big  advan¬ 
tage:  of  object-oriented  frameworks  is  the  (usually)  direct  correspondence 
between  the  specification  and  the  world. 
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Our  representational  framework  was  particularly  influenced  by  semantic 
net-works,  [QuillianRB.  Findler79],  which  have  been  popular  representational 
frameworks  in  AJ  for  more  than  a  decade. 

One  of  the  advantages  of  basing  our  work  on  existing  results  from 
Knowledge  Representation  is  that  we  have  (at  least  partial)  answers  for  issues 
that,  have  barely  been  addressed  in  Software  Engineering,  such  as: 

-  formal  accounts  of  rich  representational  frameworks; 

-  logical  consistency  of  and  deduction  within  knowledge  bases; 

-  representation  of  uncertainty  and  incompleteness  of  knowledge; 

-  representation  of  temporal  knowledge; 

-  representation  of  default  knowledge,  also  exceptions  and  what  to  do  with 

them; 

-  philosophical  underpinnings  of  a  representation. 

Our  framework  itself  derives  from  Procedural  Semantic  Networks  [L.evesque?9] 
and  has  the  benefit  of  borrowing  insights  and  results  from  this  and  related  work 
in  AI. 

The  second  basic  ingredient  of  the  framework  is  a  set  of  abstraction 
m.echa.nisms ,  which  are  used  to  structure,  and  organize  large  specifications. 

An  abstraction  mechanism  is  a  conceptual  tool  that  allows  certain  informa¬ 
tion  to  be  highlighted  while  suppressing  other  information.  In  Software 
Engineering,  abstraction  is  usually  equated  with  the  suppression  of  design  deci¬ 
sions  on  implementation  detail,  and  is  therefore  orthogonal  and  complemen¬ 
tary  to  the  mechanisms  discussed  here.  Our  abstraction  facilities  are  based  on 
the  notions  of  aggregation,  classification,  and  generalization. 

If  we  define  a  property  to  be  a  directed  relationship  between  two  objects, 
aggregation  allows  one  to  view  an  object  as  a  composite  of  the  objects  to  which 
it  is  related  by  properties.  For  example,  a  person  can  be  thought  as  a  composite 
object  consisting  of  a  name,  an  address,  and  so  on.  The  "abstraction"  here  is 
that  one  may  talk  about  an  object  while  choosing  to  ignore  its  components  for 
the  moment. 

Along  an  orthogonal  dimension,  the  classification  abstraction  allows 
objects  that  share  common  properties,  e.g.  the  objects  representing  John 
Smith,  Mary  Brown  and  Harry  Wong,  to  be  grouped  into  a  class  where  we  can 
describe  their  commonalities.  A  class  represents  a  generic  concept,  such  as 
"person"  or  "employee",  and  it.  also  serves  as  a  template  for  its  members  which 
are  called  its  instances.  To  each  class  we  associate  structural  jiroperti.es  which 
express  such  information  as  "persons  have  a  name,  an  age,  an  address"  and  so 
on.  Also  constraints  on  the  values  that  can  be  assigned  to  these  properties  for 
an  instance  of  the  class.  Instances  have  factual  properties  which,  for  example, 
attribute  specific  names,  ages,  etc.  to  individual  persons.  The  "abstraction" 
here  is  that  one  may  describe  a  class  without  referring  to  its  instances.  Note 
that  classes  are  themselves  objects  and  can  therefore  be  instances  of  other 
classes  (metaclasses),  which  are  also  instances  of  yet  other  classes  and  so  on. 
Indeed,  every  object  must  be  an  instance  of  at  least  one  class. 

Generalization  allows  the  common  properties  of  several  classes  to  be 
abstracted  into  the  definition  of  a  single,  more  general,  class.  For  example,  the 
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class  of  persons  can  be  represented  as  a  generalization  of  the  classes 
representing  males,  females,  managers,  engineers,  female  engineers,  and  so 
on. 

Accordingly,  classes  are  organized  into  a  hierarchy  with  general  classes 
located  above  their  specializations.  If  one  class  is  defined  to  be  a  specialization 
of  another,  every  instance  of  the  first  must  also  be  an  instance  of  the  second, 
e.g.,  every  instance  of  the  class  of  employees  is  an  instance  of  the  class  of  per¬ 
sons.  Another  important  consequence  of  this  organization  is  that  (structural) 
properties  can  be  inherited  from  a  class  to  its  specializations,  e.g.,  the  class  of 
employees  inherits  properties  such  as  name,  address,  and  so  on.  from  the  class 
of  persons. 

Generalization  hierarchies  are  also  referred  to  as  IS-A  hierarchies  in  AI. 
The  taxonomic  organization  provided  by  IS-A  hierarchies  can  lead  to  models 
that  are  more  understandable  in  the  sense  that  the  distance  of  two  classes  with 
respect  to  an  IS-A  hierarchy  is  related  to  the  "similarity’'  of  the  two  classes. 
Also,  IS-A  hierarchies  can  lead  to  more  concise  models,  since  it  is  sufficient  to 
associate  (structural)  properties  to  the  most  general,  applicable  class  and  let 
inheritance  take  care  of  the  rest. 

The  name  "IS-A  hierarchies"  arose  from  sentences  in  English  such  as  "an 
employee  is  a  person"  and  "John  is  a  person".  One  should  be  careful,  however, 
to  distinguish  between  these  two:  the  first  expresses  an  IS-A  relationship 
between  two  classes,  while  the  second  expresses  a  relationship  between  the 
class  of  persons  and  one  of  its  instances. 

A  fundamental  premise  of  our  research  states  that  this  simple  framework 
based  on  three  abstraction  mechanisms  can  be  used  effectively  to  structure 
any  kind  of  knowledge  and  is  therefore  adequate  for  requirements 
specifications.  * 

3.  Some  Features  of  RML 

The  objects  of  an  RML  specification  are  partitioned  into  tokens,  classes  and 
metaclasses  depending  on  the  world  entity  or  concept  they  represent.  Tokens 
represent  particular  (physical  or  abstract)  entities  such  as  the  person  John 
Smith  or  the  number  7.  Classes  represent  concepts  that  have  particular  enti¬ 
ties  as  instances,  such  as  the  concepts  "person"  and  "number".  Metaclasses 
have  classes  as  instances.  For  example,  the  rnetaclass  "person-class"  has 
classes  such  as  "person",  "employee",  "manager"  etc.  as  instances  and 
describes  their  common  characteristics  (e.g.  each  one  of  them  has  an  "average 
age"  property).  Three  primitive  metametaclasses  available  to  the  RML  user  help 
with  the  uniformity  of  the  representation. 

Along  an  orthogonal  dimension,  RML  offers  three  kinds  of  classes: 

entity:  such  classes  represent  concepts  whose  instances  participate  in 

events  and  whose  properties  are  affected  by  these  events,  e.g.  "person"; 

activity:  describes  an  instantaneous  or  long-term  event  which  affects  the 

state  of  the  world,  e.g.  "admit- to-hospi  Lai”; 

assertion:  is  a  statement  about  the  world  used  to  describe  either  an  entity 

or  an  activity  class,  e.g.  the  precondition  "patient  must  have  a  hospital 

♦  Actually,  the  framework  is  also  used  for  design  specifications  via  the  design  language 
Taxis,  but  we  limit  the  discussion  here  to  requirements  specifications. 
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insurance  number”,  which  might  be  associated  with  the  "admit-to-hospital” 

activity. 

The  three  abstraction  mechanisms  offered  b>  the  representational  frame¬ 
work  are  used  to  organize  all  three  kinds  of  classes.  For  instance,  the  com¬ 
ponents  of  an  ehtity  class,  e.g.  "patient",  include  other  entity  classes,  e.g.  "doc¬ 
tor",  and  "hospital -insurance-//";  also  activities  that  specify  how  patients  are 
inserted  in  and  removed  from  a  specification,  along  with  assertions  on  the  pro¬ 
perty  values  of  a  patient,  e.g.  "the  patient’s  doctor  must  be  in  the  hospital 
staff”.  Similar  remarks  apply  to  activity  and  assertion  classes  whose  com¬ 
ponents  are  other  entity,  activity  and  assertion  classes. 

Classes  of  all  three  kinds  are  also  organized  into  an  1S-A  hierarchy  where, 
for  example,  the  activity  class  "admit-to-hospital”  might  be  placed  above  the 
activity  classes  "admit-VIP-to-hospital”  and  "emergency-admit-patient-to- 
hospital”. 

RML  was  designed  with  the  view  that  in  requirements  specifications  we 
want  to  describe  WHAT  is  supposed  to  happen  without  describing  HOW.  Thus  RML 
does  not  offer  control  flow  primitives.  Instead,  its  activities  are  specified  in 
terms  of  preconditions,  postconditions,  input,  output  and  control  objects,  also 
other  component  activities.  The  design  of  RML  was  considerably  influenced  by 
SADT  *  [Ross77]  and  we  have  studied  the  mapping  from  SADT  to  RML  in  order  to 
identify  similarities  and  differences  between  the  two  notations  [Grcenspan03b]. 

4.  Taxonomic  Stepwise  Refinement 

A  methodology  for  specification/modelling  should  provide  guidance  to  its 
users.  At  the  heart  of  many  software  development  methodologies  lies  one  or 
more  abstraction  mechanisms,  which  allow  us  to  ignore  details  at  some  level, 
along  with  a  refinement  principle,  which  allows  for  the  guided  and  gradual  rcin- 
troduction  of  details  across  the  abstraction  dimension.  The  aggregation 
abstraction  forms  the  core  of  software  design  methodologies  such  as  "stepwise 
refinement"  [Wirth7lj.  Similarly,  the  "implementation"  dimension  -  the  one 
which  includes  as  "points"  the  requirements  specification  level,  the  design 
level,  the  implementation  level  or  variations  thereof  -  is  the  basis  for 
approaches  such  as  the  abstract  machine  approach  [Dijkstra7<?].  The  generali¬ 
zation  abstraction  has  not  been  exploited  very  much  in  Software  Engineering 
(although  Simula  does  have  classes  with  a  rudimentary  inheritance  mechan¬ 
ism).  Yet,  we  believe  that  it  is  an  invaluable  organizational  tool  for  system 
description  in  general,  and  for  requirements  modelling  in  particular. 

The  main  idea  of  specification  guided  by  generalization  is  that  a 
specification  can  be  constructed  by  modelling  first  the  most  general  concepts, 
and  then  proceeding  to  more  specialized  ones.  For  example,  in  modelling  a  hos¬ 
pital  world,  one  might  consider  first  the  concepts  of  patient,  doctor,  admission, 
treatment,  etc.  Later,  the  modeller  can  differentiate  between  child  patients, 
heart  patients,  internists  and  surgeons,  surgical  and  medical  treatments,  etc. 
At  each  step,  only  the  information  (properties)  appropriate  to  that  level  need  be 
considered. 

Generalization  is  the  appropriate  organizational  principle  to  rely  on  when 
the  difficulty  of  modelling  is  due  to  the  large  number  of  similar  concepts  that 
need  to  be  represented  and  integrated  into  a  coherent  specification  rather  than 
to  the  complexity  of  the  concepts  involved.- It  is  our  contention  that  many  of 
the  software  development  projects  are  dealing  precisely  with  this  type  of 


*  SADT  is  <i  trademark  of  Softcch  inc. 
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situation  in  applications  that  involve  interactive  information  systems.  Hence, 
we  feel  that  a  specification  methodology  based  on  generalization,  which  we 
might  call  taxonomic  stepzoise  refinement  has  a  prominent  role  to  play  in 
Software  Engineering.  rIhis  theme  is  further  examined  in  [BorgidaB3a].  Note 
that  we  view  this  methodology  as  orthogonal  and  complementary  to  the  much 
studied  and  discussed  "stepwise  refinement  by  decomposition”,  whose  main 
emphasis  is  on  the  decomposition  of  complex  situations  into  a  number  of 
"simpler"  ones.  Moreover,  both  kinds  of  refinement  are  orthogonal  and  comple¬ 
mentary  to  a  third  dimension,  which  effects  the  introduction  of  implementation 
detail  as  one  moves  from  requirements  specifications  to  implementations. 

5.  Some  Representation  Issues 

One  should  not  conclude  from  the  discussion  earlier  that  the  "knowledge 
representation  problem"  has  been  solved  and  all  that  is  left  is  for  software 
engineers  to  apply  the  solutions  to  new  situations.  Although  there  has  been  pro¬ 
gress,  much  remains  to  be  done  in  the  areas  of  knowledge  organization,  the 
representation  of  semantic  notions  such  as  time  and  causality,  action  and 
intention.  More  importantly  perhaps,  these  notions  have  to  be  integrated  into  a 
formal  language  that  is  suitable  for  the  construction  of  large  knowledge 
bases.!*'  An  overview  of  the  area  can  be  found  in  [Mylopoulos83].  The  rest  of 
this  section  presents  brief  introductions  to  three  representation  problems  we 
consider  particularly  important  for  requirements  specifications. 

5.1.  Incompleteness 

It  is  rarely  the  case  that  a  person,  or  system,  has  complete  knowledge  of  a 
slice:  of  reality  he/she/it  has  an  interest  in.  For  our  hospital  example,  it  may  be 
fair  to  assume  that  all  patients  (for  a  given  hospital)  are  known,  but  that  is  not 
a  fair  assumption  for  doctors  (who  may  or  may  not  be  affiliated  with  the  hospi¬ 
tal).  We  may  want  to  represent  explicitly  in  a  requirements  specification  what  is 
completely  known  (the  hospital  patients)  and  what  isn’t  (the  doctors).  The 
Closed  World  Assumption  [ReiterB3]  for  a  given  class  states  that  all  instances  of 
the  concept  it  represents  are  known.  This  assumption  is  only  the  tip  of  the  ice¬ 
berg  as  far  as  incompleteness  is  concerned.  We  may  want  to  represent 
knowledge  such  us: 

"there  are  at  least  two  unknown  doctors". 


or 


"patients  with  unknown  hospital  insur  ance  numbers  will  be  admit¬ 
ted  but  must  present  their  number  within  a  day;  patients  without 
such  numbers  will  not  be  admitted". 

[LevesqueBl  |  addresses  such  knowledge  incompleteness  issues  from  a  formal 
point  of  view. 


*  An  analogy  can  be  drawn  here  with  APL  the  mathematical  theory  of  matrices  was  known 
for  a  long  time;  nevertheless,  it  rook  considerable  ingenuity  to  design  a  programming 
language  based  on  that  theory. 
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t>.2.  Exceptions 

There  are  no  rules  without  exceptions.  The  procedures,  rules  and  regula¬ 
tions  represented  in  a  specification  are  bound  to  be  contradicted  by  the  events 
they  are  intended  to  predict.  The  systems  that  are  based  on  a  given 
specification  must  have  the  flexibility  to  (a)  allow  exceptions  to  specified  con¬ 
straints,  and  (b)  handle  these  exceptions  in  an  effective  manner.  For  Knowledge 
Representation,  these  requirements  mean  that  we  must  provide  the  designer  of 
a  specification  with  tools  addressing  explicitly  this  issue.  Although  some  work 
has  been  done  on  this  from  a  Programming  Languages  point  of  view,  e.g. 
[Coodenough77],  and  some  attention  has  been  paid  within  Al,  [Minsky75],  excep¬ 
tions  remain  a  largely  unexplored  research  question,  despite  its  importance. 
[horgida83b]  presents  further  discussion  on  tnis  issue. 

5.3.  Relativism 

It  is  well  known  that  one  of  the  major  problems  in  constructing  a 
specification  is  the  multiplicity  of  viewpoints  that  need  to  be  accounted  for.  For 
a  hospital  setting,  the  software  engineer  who  is  trying  to  find  out  how  things  are 
done  at  the  hospital,  before  building  a  specification,  may  get  very  different 
accounts  of  how  a  patient  is  admitted  to  the  hospital  depending  on  whether  he 
talks  to  the  nurse  who  does  the  admissions  or  his  supervisor.  Sometimes  the 
differences  can  be  resolved  before  a  specification  is  built.  Other  times,  however, 
contradictions  between  different  viewpoints  are  fundamental  and  should  be  part 
of  the  specification.  This  introduces  yet  another  demand  on  the  representation 
language  in  use.  It  should  allow  for  the  possibility  that  a  specification  is  a  col¬ 
lection  of  viewpoints,  each  of  which  is  a  specification  in  its  own  right.  It  should 
also  facilitate  the  description  of  relationships  between  the  components  of  these 
viewpoints. 

6.  Conclusions 

In  summary,  we  are  advocating  several  positions  which  we  consider  impor¬ 
tant.  for  Software  Engineering  in  general  and  requirements  modelling  research 
in  particular: 

Requirements  specification  languages  can  and  should  be  based  on 
Knowledge  Representation  research. 

There  exists  a  set  of  "universal"  abstraction  mechanisms  that  can  be  used 
to  structure  any  kind  of  knowledge,  static  and  dynamic,  procedural  and 
declarative,  syntactic,  semantic  and  pragmatic.  Our  version  of  this  set 
includes  generalization,  aggregation  and  classification,  but  at.  least  in  prin¬ 
ciple,  there  could  be  other  versions. 

Taxonomic  stepwise  refinement  is  an  important  methodology  for  require¬ 
ments  (as  well  as  design)  specifications  in  applications  involving  large 
masses  of  simple  detail  rather  than  few  complex  concepts;  many  of  the 
database-oriented  applications  dealt  with  by  software  engineers  today  fall 
in  this  category. 

The  Taxis  project  has  developed  a  variety  of  languages  and  tools  for 
requirements  and  design,  based  on  the  positions  outlined  above.  A  summary  of 
the  project’s  current  status  can  be  found  in  [Greonspan83].  Other  work  in  the 
same  spirit  includes  [llubenkoBl,  MittermeirOO,  Roussopoulos79,  WiIson79, 
YehBO]. 
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Abstract.  We  take  the  position  that  many  of  the  problems  pertaining  to 
models  and  languages  for  software  specification  and  design  can  be 
approached  from  the  standpoint  of  Knowledge  Representation  (KR) 
research  in  Artificial  Intelligence  (AJ).  Some  of  the  work  we  have  done 
that  is  based  on  this  view  was  presented  previously  (e.g.  [4,5]).  Here,  we  go 
a  step  further  and  take  the  opportunity  to  discuss  some  of  the  specific  open 
problem  areas  that  will  eventually  need  to  be  faced  in  Software  Engineering 
and  for  which  Al  offers  some  direction. 


Software  development  can  be  viewed  as  the  production  of  a  series  of  models,  starting 
with  requirements  and  progressing  through  design  and  implementation.  As  the  first  model,  a 
requirements  specification  is  viewed  as  encompassing  a  model  of  the  world  that  captures  as 
much  as  possible  of  the  relevant  Information  pertaining  to  the  design  of  some  software 
systems.  Successive  software  development  steps  transform  this  model  into  a  design  and 
then  an  implementation.  ’ 

A  longstanding  and  pressing  problem  in  Softv/are  Engineering  is  how  to  capture  the 
needed  information  about  the  application  domain  in  a  way  that  is  both  (a)  natural  and 
direct  and  (b)  formal ,  the  former  to  facilitate  understandability  and  constructibility,  the 
latter  to  support  formal  analyses  and  automated  tools.  Natural  language  has  proved 
inadequate  with  respect  to  the  latter,  and  formal  specification  languages  with  respect  to 
the  latter.  What  are  needed  are  languages,  tools,  and  methodologies  for  encoding  detailed 
and  complex  knowledge  about  the  world. 

We  are  addressing  these  problems  from  the  point  of  view  of  Artificial  Intelligence 
(AI).  Our  position  is  that  an  effective  way  to  tackle  the  problems  is  by  applying  results  of 
Knowledge  Representation  (KR)  research  in  Al. 

To  us,  a  requirements  specification  is  above  all  a  knowledge  base  that  captures  the 
semantics  of  some  slice  of  reality.  Languages  for  requirements  specifications  must  offer 
facilities  for  the  representation  and  organization  of  knowledge  into  a  coherent  structure 
that  is  understandable  by  its  designer(s),  by  programmers  building  systems  based  on  a  given 
requirements  specification,  and  by  end  users  of  these  systems. 


1  This  view  Is  consistent  In  spirit  with  Jackson's  [7]. 

Robert  G.  Babb  II  and  Ali  Mili  (editors) , 

International  Workshop  on  Models  and  Languages  for  Software  Specification 
and  Design,  Workshop  Notes,  Orlando,  FL,  March  30,  1984,  pp.  35-38. 
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As  part  of  the  Taxis  Project  at  the  University  of  Toronto,  we  have  designed  a 
requirements  language,  RML  [4,6],  based  on  KR  concepts,  semantic  networks  in  particular. 
RML  modeling  is  followed  by  system  design  In  the  Taxis  language  [10],  J  which  also 
represents  the  semantics  of  entities  in  the  world  but,  in  essence,  replaces  RML's  structured 
network  of  constraints  with  a  structured  network  of  data  structures  and  executable 
expressions.  Taxis  programs  can  be  compiled  into  a  standard  programming  language  plus 
relational  database  facilities.  s 

Our  experience  with  RML  and  Taxis  reinforces  our  belief  that  the  KR  concepts  are 
applicable  to  Software  Engineering  problems.  In  particular,  we  claim  that  the  following 
principles  underlying  our  KR  approach  to  software  development  are  beneficial: 

i.  Use  of  an  object-oriented  fi-amewarfc,  in  which  each  unit  of  specification,  called  an 
object,  is  meant  to  represent  one  concept  of  the  domain  of  discourse,  and  objects 
are  defined  in  terms  of  their  relationships  with  other  objects.  RML  and  Taxis  each 
offer  object  and  relationship  types  that  are  appropriate  to  their  purposes. 

ii.  Use  of  abstraction  principles  to  organize  specifications.  RML  and  Taxis  are  based 
on  a  framework  that  uses  aggregation,  classification ,  and  generalization  (IS- A 
hierarchies),  which  have  gained  wide  acceptance  in  areas  such  as  Al  and  Data 
Bases.  (See  [14].) 

iii.  Use  of  assertions  as  a  basic  unit  of  information;  RML  offers  assertions  for 
expressing  a  variety  of  facts  and  constraints,  including  preconditions, 
postconditions,  invariants,  initial  conditions,  etc.  Assertions  are  treated  just  like 
other  objects,  i.e.  they  are  organized  according  to  the  abstraction  principles. 

iv.  Existence  of  formal  underpinnings,  while  sheltering  the  modeler  from  the  mathematical 
formalism;  RML  is  formally  defined  by  a  translation  into  First-Order  Logic,  but  the  user 
of  the  language  only  deals  with  objects. 

To  illustrate  the  difficult  issues  that  we  believe  must  eventually  be  addressed  for 
"models  and  languages  for  software  specification  and  design"  and  are  currently  being 
addressed  in  Al,  let  us  consider  only  a  few  of  the  representation  problems  dealt  with  in  KR 
research: 1 2  3 

1  Exceptions  and  defaults.  There  are  no  rules  without  exceptions.  The  procedures,  rules  and 
regulations  represented  in  a  specification  are  bound  to  be  contradicted  by  the  events  they 
are  intended  to  predict.  The  systems  that  are  based  on  a  given  specification  must  have  the 
flexibility  to  allow  exceptions  to  specified  constraints.  While  some  attention  has  been  paid 
to  exceptions  from  a  Programming  Languages  point  of  view,  e.g.  [3],  and  also  from  within  Al 
[9],  exceptions  remain  a  largely  unexplored  research  question,  despite  its  importance. 
Borgida  [2]  presents  further  discussion  on  this  issue. 

2  Uncertainty  and  incompleteness.  It  is  rarely  the  case  that  persons  or  systems  have 
complete  knowledge  of  a  slice  of  reality  in  which  they  have  an  interest.  For  example,  it  may 
be  fair  to  assume  that  all  patients  (for  a  given  hospital)  are  known,  but  that  is  not  a  fair 
assumption  for  doctors  (who  may  or  may  not  be  affiliated  with  the  hospital).  We  may  want 
to  represent  explicitly  in  a  model  what  is  completely  known  (the  hospital  patients)  and  what 
Is  not  (the  doctors).  The  Closed  World  Assumption  [15]  for  a  given  class,  states  that  all 
instances  of  the  concept  it  represents  are  known.  This  assumption  is  only  the  tip  of  the 


1  Taxis  Is  Intended  for  the  design  of  Information  Systems. 

2  A  translator  has  been  vsrltten  [12]  from  Taxis  to  Pascal/R. 

3  An  overview  of  ICR  can  be  found  tn  [11]. 
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iceberg  as  far  as  incompleteness  is  concerned.  We  may  want  to  represent  knowledge  such 
as: 


"there  are  at  least  two  unknown  doctors." 

"patients  with  unknown  hospital  insurance  numbers  will  be  admitted  but  must 
present  their  number  within  a  day;  patients  without  such  numbers  will  not  be 
admitted" 

Levesque  [8]  addresses  such  knowledge  incompleteness  issues  from  a  formal  point  of  view. 

3  Multiple  viewpoints.  It  is  wel!  known  that  one  of  the  major  problems  in  constructing  a 
specification  is  the  multiplicity  of  viewpoints  that  need  to  be  accounted  for.  For  a  hospital 
setting,  the  software  engineer  who  is  trying  to  find  out  how  are  things  done  at  the  hospital, 
before  building  a  specification,  may  get  very  different  accounts  on  how  is  a  patient  admitted 
to  the  hospital  depending  on  whether  he  talks  to  the  nurse  who  does  the  admissions  or  his 
supervisor.  Sometimes  the  differences  can  be  resolved  before  a  specification  is  built.  Other 
times,  however,  contradictions  between  different  viewpoints  are  fundamental  and  should  be 
part  of  the  specification.  This  introduces  yet  another  demand  on  the  representation 
language  in  use.  It  should  allow/  for  the  possibility  that  a  specification  is  a  collection  of 
viewpoints,  each  of  which  is  a  specification  in  its  own  right.  It  should  also  facilitate  the 
description  of  relationships  between  the  components  of  these  viewpoints. 

4  Flexible  user  interfaces. The  degree  of  flexibility  of  user  interfaces  depends  on  the  extent 
to  which  one  can  model  the  behavior  of  user/machine  interaction  and  the  patterns  of 
language  that  are  used  for  communication.  For  Taxis,  interaction  is  accomplished  using  a 
Petri-Net-like  process  description  formalism  combined  with  send/receive  communication 
primitives  [1].  Linguistic  aspects  are  represented  using  a  scheme  for  defining  classes  of 
lexical  and  grammatical  objects,  whose  semantics  are  given  by  a  mapping  between  these 
objects  and  the  objects  in  the  database  [13]. 

5  Evolving  models. In  order  to  deal  with  specifications/models  (including  software)  over  their 
lifetimes,  it  is  best  to  treat  them  as  "evolving"  rather  than  as  being  "maintained".  A  topic  of 
relevance  is  "knowledge  acquisition",  which  is  concerned  with  the  effect  of  adding  new 
knowledge  to  an  existing  knowledge  base.  An  important  issue  is  how  to  incorporate  facts 
that  may  contradict  what  is  already  present. 

In  summary,  Software  Engineering  must  address  the  problem  of  how  to  capture 
knowledge  about  the  world  in  a  form  usable  to  both  persons  and  machines.  1  Moreover,  if  the 
area  of  Software  Engineering  intends  to  deal  with  the  construction  of  systems  that  exhibit 
such  attributes  as  "flexibility",  "friendliness",  or  "intelligence",  richer 
representation/modeling  schemes  will  be  needed. 

One  should  not  conclude  from  this  discussion,  however,  that  the  "knowledge 
representation  problem"  has  been  solved  and  all  that  is  left  is  for  software  engineers  to 
apply  the  solutions  to  new  situations.  Although  there  has  been  progress,  much  remains  to  be 
done  in  areas  such  as  knowledge  organization,  knowledge  acquisition,  and  the 
representation  of  semantic  notions  such  as  time  and  causality,  action,  and  intention.  More 
importantly,  these  notions  have  to  be  integrated  into  a  formal  language  that  is  suitable  for 
the  construction  of  large  knowledge  bases. 


^  As  a  special  case  of  particular  Importance  to  Software  Engineering,  consider  the  construction  of  software  tools 
such  as  for  computer-aided  programming,  maintenance,  etc.;  these  systems  need  knowledge  bases  about 
programming  concepts  and  methods. 
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I.  Overview 

We  are  interested  in  the  design  of  interactive  information  systems  such  as  ones  for  air¬ 
line  reservations,  on-line  inventory  control,  project  management  and  office  systems.  Such  sys¬ 
tems  consist  of  possibly  several  databases,  applications  programs  and  user  interfaces.  Existing 
tools  for  building  them  are  woefully  inadequate  in  that  they  deal  with  either  the  problem  of 
building  a  database  or  that  of  building  applications  programs  but  seldom  both.  Perhaps  more 
significantly,  almost  no  attention  has  been  paid  to  the  problem  of  building  user  interfaces, 
despite  the  fact  that  a  high  proportion  of  total  information  system  development  costs  involve 
the  interface  with  users  of  the  application,  including  various  dialogues,  input  formats,  error 
checking,  help  and  explanation  messages  and  the  like.  Indeed,  some  studies  indicate  that  the 
cost  for  the  maintenance  of  user  interfaces  may  be  as  high  as  60%  of  the  total  maintenance 
cost  [Lientz  and  Swanson  81]. 

It  is  true  that  the  maintenance  problem  has  been  the  target  of  many  suggestions  for 
improvements.  Structured  programming,  automated  documentation,  abstract  data  types,  and 
the  like  are  tools  that  have  been  proposed  to  make  a  given  application  easier  to  understand 
and  to  modify  after  its  initial  development.  Nevertheless,  it  is  also  true  that  traditional  pro¬ 
gramming  language  constructs  are  strongly  oriented  towards  improving  the  programmer’s 
effectiveness  in  developing  the  algorithmic  and  data  manipulation  aspects  of  an  application. 
When  it  comes  to  building  the  interfaces  for  his  program,  the  programmer  is  basically  left  to 
develop  his  own  devices. 

We  approach  the  design  problem  for  interactive  information  systems  from  a  knowledge 
representation  point  of  view.  To  us,  such  a  system  is  above  all  a  knowledge  base  of  "facts". 
Some  involve  the  outside  world,  its  entities,  their  interrelations,  the  events  they  participate  in 
and  their  histories  ("semantic"  knowledge).  Others  are  about  the  grammar  and  the  lexicon 
used  for  a  particular  user  interface  ("syntactic"  knowledge).  Yet  others  are  about  the  dialogue 
structure  the  system  is  expected  to  support  ("pragmatic"  knowledge).  Above  all,  the  task  of  the 
information  system  designer  is  to  integrate  the  semantic,  syntactic  and  pragmatic  knowledge 
into  a  coherent  system.  To  succeed  in  such  a  formidable  task,  the  designer  must  have 
knowledge  representation  tools  which  allow  him  to  maintain  a  good  understanding  of  his  sys¬ 
tem  as  it  is  developing.  Uniformity  of  representation  is  all-important  here.  Efficiency  is  only 
to  be  dealt  with  at  a  later  stage  of  system  development. 

The  premises  outlined  above  form  the  basic  design  principles  for  the  Taxis 
language  [Mylopoulos  et  al.  80,  Borgida  et  al.  82].  Taxis  offers  uniform  modeling  tools  for  the 
representation  of  semantic  knowledge,  based  on  ideas  from  semantic  networks  [Brachman  79]. 
A  Taxis  program  is  a  collection  of  interrelated  objects  organized  along  three  different  dimen¬ 
sions.  Every  object  is  an  instance-of  some  other  generic  object  (for  example,  ’john’  is  an 
instance  of  ’person’  and  ’person’  is  an  instance  of  ’person-class’).  Generic  objects  ("classes")  are 
organized  into  IS-A  hierarchies  according  to  their  generality/  specificity  (thus  ’student’  is 
placed  below  ’person’  on  the  IS-A  hierarchy).  Objects  are  described  in  terms  of  their  parts  or 
components  (e.g.  the  parts  of  ’person’  are  ’name’,  ’address’  and  ’social-insurance-no’,  in  a 
rather  trivialized  world).  All  parts  of  a  Taxis  program  are  treated  in  terms  of  this  framework. 
Thus  data,  transactions,  exceptions  (which  may  arise  in  the  course  of  transaction  execution) 
and  processes  (modeling  long  term  events)  are  all  treated  as  objects  organized  along  the  three 
dimensions. 
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Recently,  we  extended  and  simplified  Taxis  to  facilitate  the  representation  of  syntactic 
and  pragmatic  knowledge  in  a  information  system  design  [Pilote  83a,b,c].  The  resulting 
language,  which  we  have  named  Interpret,  maintains  the  Taxis  representational  framework 
but  applies  it  to  the  uniform  representation  of  all  three  kinds  of  knowledge  (semantic,  syntac¬ 
tic  and  pragmatic). 

II.  Details  of  our  Approach  for  User  Interfaces 

We  believe  that  what  is  most  crucially  lacking  in  the  area  of  user  interface  design  is  a  set 
of  tools  to  integrate  various  mechanisms  that  arc  well  understood  and  useful,  but  each  address 
only  limited  aspects  of  the  information  system  design  problem. 

[Pilote  83a]  shows  how  the  Interpret  formalism  can  be  applied  to  duplicate,  organize  and 
extend  results  from  many  of  the  main  approaches  to  user  interface  building  and  natural 
language  processing.  The  main  contribution  of  this  work  lies  in  the  way  in  which  it  integrates 
so  many  different  computing  formalisms  into  one  single,  highly  coherent  and  very  simple 
framework,  one  that  can  even  be  used  to  specify  a  large  part  of  its  own  formalization. 

What  is  probably  the  most  significant  departure  of  Interpret  from  other  programming 
languages  is  the  complete  and  explicit  separation  that  it  makes  between  semantic  and  syntactic 
aspects  of  an  application.  This  feature  is  already  implicitly  used  inside  compilers.  Separating 
names  and  grammatical  forms  from  the  semantic  effects  and  objects  they  represent  means  that 
we  can  model  and  explicitly  manipulate  these  syntactic  aspects.  This  mechanism  is  essential 
for  designing  user  interfaces,  where  a  programmer  must  be  in  complete  control  of  the  various 
inputs  that  a  user  might  submit  to  a  given  application. 

So-called  "pragmatic'  aspects  of  an  application,  are  most  often  buried  in  the  control  flow 
of  a  program.  Interpret  introduces  internal  objects  to  model  various  states  of  a  computation 
and  the  various  state  transformations  that  are  allowed  between  these  states.  This  mechanism 
allows  the  specification  of  the  valid  sequences  of  events  for  each  step  of  an  application.  It  also 
allows  programmers  and  even  users  to  examine,  access,  and  even  manipulate  these  objects. 
The  resulting  mechanism,  called  "scripts",  is  a  simplification  and  an  extension  of  Taxis  scripts 
[Barron  81],  which  are  themselves  a  combination  of  "Augmented  Petri  Nets"  [Zisman  77]  and 
"Communicating  Sequential  Processes'  [Hoare  78].  Note  that  states  and  scripts,  like  every¬ 
thing  else,  are  organized  in  terms  of  the  three  dimensions  of  the  representational  framework. 
Thus,  if  a  state  class  SI  is  declared  as  IS-A  another  state  class  S2,  then  any  transition  valid  in 
S2  will  also  apply  for  an  instance  of  SI.  This  is  analogous  to  saying  that  if  a  data  class,  say  ’stu¬ 
dent’,  is  defined  as  IS-A  another  data  class,  say  ’person’,  the  first  will  inherit  all  the  attributes 
of  the  second. 

The  Interpret  framework  has  been  formalized  in  [Pilote  83a]  in  terms  of  Interpret  con¬ 
structs.  The  formalization  is  essentially  an  "implementation  on  paper"  using  the  language  itself 
as  vehicle.  To  avoid  circularity  problems,  this  account  of  Interpret  is  boot-strapped  from  a 
subset  of  Taxis  which  is  already  formalized  in  set  theoretical  terms  in  [Mylopoulos  and  Wong 
80]. 

In  addition  to  this  description  of  itself.  Interpret  has  been  tested  on  a  number  of  exam¬ 
ples,  including  an  account  of  the  syntax  and  semantics  of  a  subset  of  an  English  grammar 
based  on  [Gazdar  82]. 

III.  Three  ways  to  simplify  the  design  of  User  Interfaces 

A  major  contribution  of  this  work  is  a  proposal  for  a  set  of  techniques  to  simplify  the 
design  of  user  interfaces  [Pilote  83a]: 

1.  reduce  the  complexity  of  the  problem  by  breaking  it  down  into  three  components;  this 

allows  the  specification  of  user-oriented  syntactic  knowledge  to  be  postponed  until  the 

last  step  of  the  design; 
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2.  maximize  the  use  of  syntactic  information  already  specified  in  the  declaration  of  data 
types  and  programs;  the  grammar  and  the  lexicon  used  for  the  design  of  a  user  interface, 
are  available  for  possible  use,  as  part  of  interface  under  design; 

3.  part  of  the  design  job  can  be  offloaded  to  a  user  of  the  interface. 

111.1.  A  three-step  design  methodology 

The  decomposition  of  a  user  interface  design  into  three  components  is  particularly 
significant  in  helping  potential  users  participate  in  the  design  of  a  new  user  interface.  Note 
that  unlike  any  existing  system  for  the  generation  of  user  interfaces,  including  natural 
language  front  ends,  it  is  proposed  here  to  represent  all  three  different  components  of  an 
interactive  information  system  in  terms  of  one  representational  framework,  thus  making  it 
easier  to  describe  the  interconnections  between  the  three  components. 

111 .2.  Maximize  the  use  of  predefined  Information 

As  the  designer  builds  his  user  interface,  he  is  making  use  of  the  grammar  provided  by 
the  programming  language  (Interpret)  to  develop  a  lexicon  of  words  in  terms  of  which  he  can 
refer  to  objects.  This  information  is  treated  as  part  of  the  user  interface  under  development. 
Among  other  things,  this  allows  one  to  develop  a  user  interface  as  an  extension  of  an  existing 
Interpret  interface. 

UI3.  Offloading  to  the  user 

Finally,  a  third  way  to  reduce  the  user  interface  designer  job  is  to  offload  part  of  this  job 
to  a  user  of  the  resulting  interface.  First,  by  representing  much  of  the  above  mechanism  in  a 
sufficiently  organized  formalism  (i.e.  in  terms  of  Interpret  constructs),  we  increase  the  under- 
standability  of  the  interface  and  reduce  the  effort  required  for  a  user  to  understand  and 
introduce  further  modifications  by  himself.  The  next  step  is  to  increase  the  flexibility  of  the 
vocabulary  by  introducing  synonyms  for  already  defined  identifiers  and  custom  paraphrases 
beside  standard  English  transformations. 

Our  approach  here  clearly  belongs  to  the  'language  engineering'  stream.  [Pilote  83a] 
describes  techniques  for  accessing  the  interface  itself,  thus  helping  the  user  understand  its 
features  and  limitations. 

IV.  Consolidate  Knowledge  Representation  Tools 

Another  major  contribution  of  Interpret  is  to  clarify,  simplify  and  extend  the  Taxis 
language  with: 

1.  SNOBOL  patterns  and  ATN-Iike  facilities; 

2.  Class  operations  such  as  product,  union  and  mapping,  motivated  from  research  on  Deno- 
tational  Semantics 

In  addition,  we  introduce  a  number  of  simplifications  to  existing  Taxis  constructs,  most  partic¬ 
ularly  to  properties,  transactions,  scripts,  I/O  primitives,  exceptions  and  states. 

IV.l.  Integration  of  syntactic  mechanisms 

Perhaps  the  most  important  extension  to  Taxis  is  the  notion  of  pattern,  which  defines  col¬ 
lections  of  analyzed  lexical  expressions,  and  can  be  used  to  represent  grammatical  knowledge. 

Interpret  patterns  aim  at  combining  the  ease  of  use  and  completeness  of  BNF-Iike  gram¬ 
mars  with  the  efficiency  and  control  flow  facilities  of  ATNs.  In  fact,  they  share  many  features 
with  the  "semantic  grammars"  introduced  by  [Burton  75].  This  results  in  the  context  of  Inter¬ 
pret  in  the  integration  in  a  typed  programming  language  of  a  general  pattern  matching 
mechanism.  Moreover,  unlike  semantic  grammars,  an  Interpret  grammar  can  be  organized  in 
terms  of  the  same  principles  used  to  organize  other  kinds  of  knowledge. 
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Interpret  patterns  allow  the  definition  of  local  properties,  conditions  and  actions  on  the 
values  of  these  local  properties,  therefore  providing  facilities  comparable  to  those  of  ATNs 
[Woods  70].  These  features  also  make  patterns  comparable  to  "Attributed  Grammars"  [Knuth 
67],  increasingly  used  in  compiler  theory  to  assign  semantics  to  syntactic  constructs. 

An  interesting  side  effect  of  this  approach  is  that  representational  mechanisms  required 
for  syntactic  knowledge,  such  as  unions,  repetitions  and  optionality,  to  mention  just  a  few,  are 
generalized  to  apply  to  any  object,  syntactic  or  other. 

IVJ.  Adding  pragmatic  mechanisms 

The  pragmatic  mechanisms  of  Interpret  basically  preserve  most  of  the  corresponding 
Taxis  mechanisms,  as  described  in  [Barron  80,  81]  and  [Borgida  et  al.  82].  Interpret  however 
simplifies  and  extends  these  mechanisms  in  many  ways.  The  most  important  feature  of  the 
resulting  Interpret  pragmatic  component,  which  is  as  far  as  we  know  totally  unique  and  novel 
amongst  existing  approaches  to  design  or  represent  user  interfaces,  is  to  combine  semantic  and 
syntactic  information  in  a  way  that  allows  the  simplified  design  methodology  described  above. 

Another  novel  and  significant  feature  of  Interpret  script  is  to  allow  the  definition  of 
orientation  and  help  mechanism  in  a  fairly  straightforward  way. 

IV  J.  The  Taxis  representational  framework  revisited 

Taxis  was  originally  designed  with  ww/rfmodelling  as  the  intended  application  area.  The 
representation  of  syntactic  knowledge  only  became  an  issue  for  the  project  fairly  recently. 
[Pilote  83a]  deals  with  this  issue  by  going  back  to  the  representational  framework  of  Taxis,  and 
revising  it  in  line  with  the  latest  views  on  the  formal  semantics  of  programming  languages  and 
programming  language  design.  In  so  doing,  we  preserved  most  of  the  Taxis  mechanisms  but 
found  them  in  several  instances  to  be  special  cases  of  yet  more  general  and  powerful  mechan¬ 
isms.  The  resulting  framework  and  language  are  what  we  call  Interpret. 

One  extension  to  the  Taxis  framework  involves  the  introduction  of  union,  product  and 
mapping  as  operations  on  types.  Another  important  extension  involves  the  elevation  of  proper¬ 
ties  to  "objecthood".  This  means  that  in  the  Interpret  framework  there  are  property  classes 
and  property  instances  which  can  have  their  own  properties. 

V.  Concluding  Remarks 

The  main  advantages  of  using  a  knowledge-based  framework  for  the  design  of  user  inter¬ 
faces  arc  threefold: 

1.  the  capabilities  and  limitations  of  the  interface  are  easier  to  understand; 

2.  the  interface  can  be  developed  incrementally,  and  augmented  whenever  the  need  arises; 

3.  "meta-knowledge*  is  supported,  in  the  sense  of  allowing  access,  to  generic  information 
about  a  user  interface,  e.g.  its  grammar,  lexicon  and  dialogue  structure  through 
question-answering  techniques. 
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A  good  notation  it  more  than  mere  'convenience',  for  it  alto  allows  us  to  structure  our  ldea»  tiierurchically: 
we  can  focus  our  attention  at  the  appropriate  level. 

-  J.E.  Stoy,  Denotations!  Semantics,  1977 


I.  Overview 

Programming  Language  researchers  increasingly  recognize  that  a  high  proportion  of 
application  development  costs  involve  the  interface  with  users  of  the  application,  including 
various  dialogues,  input  formats,  error  checking,  help  and  explanation  messages,  and  the  like. 
They  also  increasingly  recognize  that  maintenance  costs  tend  to  overshadow  development 
costs.  These  two  factors  even  multiply  their  adverse  effects:  as  the  user  needs  evolve,  it  is  the 
interface  with  a  system  which  generally  requires  the  most  maintenance.  The  user  relationship 
is  even  said  to  account  for  about  60  percent  of  the  maintenance  problem  [Lientz  and  Swanson 
81]. 

Surprisingly,  few  programming  language  constructs  are  designed  to  address  the  area  of 
user  interface  design.  On  the  contrary,  traditional  programming  language  constructs  are 
strongly  oriented  towards  improving  programmers’  effectiveness  in  developing  the  algorithmic 
and  data  manipulation  aspects  of  an  application.  A  programmer  is  basically  left  to  reinvent 
each  time  the  required  procedures  to  deal  with  user  commands  and  inputs. 

The  maintenance  problem,  meanwhile,  has  been  the  target  of  many  suggestions  for 
improvements.  Structured  programming,  automated  documentation,  abstract  data  types,  and 
the  like  are  tools  that  have  been  proposed  to  make  a  given  application  easier  to  understand 
and  to  modify  after  its  development.  Some  recent  proposals  in  this  direction  involve  keeping 
around,  beside  programming  language  code,  a  more  abstract  version  of  the  application,  some¬ 
times  called  a  'logical  level”,  or  'model*.  This  idea,  which  evolved  from  Data  Management 
concerns,  where  it  is  called  'Data  Modeling”  [Tsichritsis  &  Lochovsky  81],  progressively 
influences  programming  languages  design.  A  recent  example  of  this  approach  is  the  TAXIS 
language  [Mylopoulos  et  al.  80,  Borgida  et  al.  82,  83,  Greenspan  et  al.  82],  which  integrates 
most  Data  Modeling  concepts  with  many  Abstract  Data  Types  mechanisms.  Among  a  'bewild¬ 
ering  variety  of  knowledge  representations  ...  one  of  the  most  complete  is  the  TAXIS  system 
which  has  aspects  of  all  basic  kinds  of  knowledge'  that  can  be  distinguished  in  the  current 
projects  and  approaches  to  knowledge  representation  [Sowa  80]. 

TAXIS,  like  many  other  programming  languages,  was  designed  primarily  to  describe  data 
and  programs.  This  paper  reports  a  simplification  of  TAXIS,  which  has  been  extended  with 
mechanisms  specialized  in  the  description  of  user  interface  aspects  of  an  application.  To  this 
effect,  the  task  of  describing  an  application  has  been  broken  down  into  three  almost  indepen¬ 
dent  components,  called  'semantic',  'syntactic*  and  'pragmatic',  according  to  a  distinction 
familiar  to  linguists,  and  introduced  by  [Morris  38]. 

We  call  'semantic  aspects*  the  description  of  data  and  programs  that  results  from  the 
compilation  of  program  code  and  data  definitions,  with  the  addition  of  type  information, 
which  is  kept  around  in  TAXIS  instead  of  being  thrown  away  as  in  most  other  programming 
languages.  We  call  'syntactic*  all  aspects  of  an  application  that  use  symbols  of  various  types  to 
display  and  identify  semantic  objects  for  human  users  or  programmers;  this  component 
includes  what  is  traditionally  known  as  symbol  tables,  whose  contents  are  also  generally 
thrown  away  after  compilation  in  most  other  approaches,  and  also  includes  all  formatting 
information  used  to  recognize  user  inputs  which,  when  complicated  enough,  are  called  'gram¬ 
mars'.  Finally,  we  call  'pragmatic*  all  aspects  that  specify  sequences  of  events,  dialogues,  and 
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control  flow. 

The  most  significant  feature  of  the  resulting  language,  which  we  call  ’INTERPRET’,  is 
the  extent  to  which  the  various  mechanisms  required  to  represent  these  different  components 
are  integrated  in  a  simple  and  highly  uniform  framework  and  notation.  This  language 
simplifies  and  extends  TAXIS  to  allow  for  the  description  of  all  aspects  of  user-interfaces 
[Pilote  83]. 

II.  Detail  of  our  Approach 

We  believe  that  what  is  most  crucially  lacking  in  the  area  of  user  interface  design  is  a  set 
of  tools  to  integrate  various  mechanisms  that  are  well  understood  and  useful,  but  each 
addressing  only  limited  aspects  of  the  problem. 

The  essence  of  our  approach  is  a  uniform  representation  framework,  able  to  describe  and 
provide  access  to  its  own  definition.  We  call  this  representation  framework  ’INTERPRET’  by 
analogy  with  the  main  task  of  a  user  interface,  which  is  to  analyze  user  inputs  immediately, 
and  trigger  corresponding  events,  as  is  done  in  interpreters  as  opposed  to  compilers.  We  also 
use  this  name  to  acknowledge  a  strong  influence  from  the  so-called  ''interpreter-oriented' 
methods  of  describing  the  formal  semantics  of  programming  languages,  such  as  Denotational 
Semantics  [Gordon  79],  on  the  design  of  the  INTERPRET  language. 

What  is  probably  the  most  significant  departure  of  INTERPRET  from  other  program¬ 
ming  languages  is  the  complete  and  explicit  separation  that  it  makes  between  semantic  and 
syntactic  aspects  of  an  application.  This  feature  is  already  implicitly  used  inside  compilers. 
Separating  names  and  grammatical  forms  from  the  semantic  effects  and  objects  they  represent 
means  that  we  can  model  and  explicitly  manipulate  these  syntactic  aspects.  This  mechanism  is 
essential  for  designing  user  interfaces,  where  a  programmer  must  be  in  complete  control  of  the 
various  inputs  that  a  user  might  submit  to  a  given  application. 

The  Data  Modeling  foundation  of  INTERPRET  again  can  commonly  be  recognized 
inside  compilers.  To  say  that  a  particular  real  world  object  or  event  is  modeled  by  some  'inter¬ 
nal*  objects,  to  use  the  terminology  introduced  by  [Abrial  74],  is  very  much  what  happens 
when  a  particular  data  structure  or  piece  of  code  is  assigned  to  represent  this  object  or  event. 
The  main  difference  is  that  the  Data  Modeling  approach  makes  this  mechanism  explicit. 
INTERPRET  in  particular  also  distinguishes  the  resulting  objects  from  the  syntactic  mechan¬ 
isms  used  to  represent  these  objects  and  introduces  other  internal  objects  to  model  these  syn¬ 
tactic  aspects  involved.  Identifiers  and  even  grammar  rules  can  thus  be  explicitly  manipulated 
in  INTERPRET. 

INTERPRET  goes  further  than  explicitly  representing  syntactic  mechanisms  and  also 
models  so-called  'pragmatic'  aspects  of  an  application,  which  are  most  often  buried  in  the  con¬ 
trol  flow  of  a  program.  INTERPRET  introduces  internal  objects  to  model  various  states  of  a 
computation  and  the  various  state  transformations  that  are  allowed  between  these  states.  This 
mechanism  allows  the  specification  of  the  valid  sequences  of  events  for  each  step  of  an  appli¬ 
cation.  This  technique  also  allows  programmers  and  even  users  to  examine,  access,  and  even 
manipulate  these  objects.  The  resulting  mechanism,  called  'scripts',  is  a  simplification  and  an 
extension  of  TAXIS  scripts  [Barron  81],  which  are  themselves  a  combination  of  'Augmented 
Petri  Nets'  [Zisman  77]  and  'Communicating  Sequential  Processes’  [Hoare  78]. 

II.1.  Basic  INTERPRET  framework 

As  we  mentioned  above,  INTERPRET  is  a  simple  and  uniform  framework  used  to  sup¬ 
port  all  three  types  of  mechanisms  required  for  user  interfaces.  This  framework  considers 
three  basic  types  of  objects:  tokens,  classes  and  metaclasses.  Tokens  are  undecomposable  units 
of  information,  usually  modeling  actual  entities  in  an  application  domain.  Classes  correspond 
to  collections  of  tokens  sharing  some  common  'properties'  (to  be  defined  below),  which  tokens 
are  said  to  be  instances  of  the  class;  this  INSTANCE-OF  relationship  relates  an  object,  e.g. 
John,  to  a  class  of  which  it  is  an  instance,  e.g.  PERSON.  Similarly,  collections  of  classes  can 
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be  themselves  grouped  into  higher  level  classes,  called  metaclasses. 

All  (mcta)classes  constituting  a  TAXIS  or  INTERPRET  program  are  organized  into  an 
IS-A  hierarchy  in  terms  of  the  binary  relation  IS-A  which  is  a  partial  order.  This  IS-A  rela¬ 
tionship  will  sometimes  be  referred  to  as  "specialization*  when  going  from  more  to  less  gen¬ 
eral,  or  conversely,  'generalization*.  The  IS-A  relationship  relates  a  class,  e.g.  STUDENT,  to 
another  more  general  one,  e.g.  PERSON.  In  particular,  relations,  transactions,  and  exceptions 
are  all  treated  as  classes  defined  through  the  properties  that  relate  them  to  other  classes  and 
organized  in  terms  of  the  IS-A  relation  into  a  hierarchy. 

The  IS-A  and  INSTANCE-OF  relationships  are  similar  to  the  mathematical  subset  and 
membership  relations  respectively.  The  main  difference  between  TAXIS  and  INTERPRETS 
IS-A  and  the  traditional  subset  relationship  is  that  the  IS-A  relation  holds  between  definitions 
of  classes,  i.e.  it  holds  even  between  classes  with  no  instances.  The  IS-A  relationship  is  thus 
very  much  like  a  constraint  on  class  instances,  and  the  subset  relation  between  the  sets  of 
instances  of  IS-A-related  classes  is  simply  a  side-effect  of  the  definition  of  a  particular  IS-A 
relation. 

Classes  and  metaclasses  model  conceptual  objects  which  are  'defined*  by  their  relations 
to  other  concepts  and  by  the  operations  that  are  allowed  on  their  instances,  much  in  the  spirit 
of  'Abstract  Data  Types*  in  Programming  Languages.  Both  these  relations  and  operations  are 
viewed  as  definitional  properties  attached  to  (meta)classes.  These  definitional  properties  res¬ 
trict  the  factual  properties  that  can  be  defined  on  instances  of  these  classes. 

Using  expressions  formally  described  in  section  III  below,  we  write  for  example: 

property  age  on  PERSON  Is  {0.200} 

to  specify  that  the  age  of  a  particular  person,  say  John,  must  be  in  the  range  {0.200}.  The  fol¬ 
lowing  expression  is  then  acceptable: 

John^ge  -  22 

meaning  that  the  value  of  the  property  age,  when  applied  to  the  object  John,  becomes  the 
number  22. 

A  new  feature  of  INTERPRET  over  TAXIS  is  that  is  considers  properties  as  objects. 
This  feature  means  that  a  property  category  is  itself  a  class  of  objects  in  INTERPRET,  whose 
instances  are  properties.  The  most  general  class  of  properties  is  called  'property*  (or  alterna¬ 
tively  ’properties'),  of  which  all  other  property  categories  are  specializations. 

The  semantics  of  each  conceptual  object  in  an  application  domain  involve  both  a 
behavioral  and  a  structural  component,  expressed  in  terms  of  the  notion  of  class,  property  and 
the  IS-A  hierarchy.  Structural  constraints  are  illustrated  by  the  above  example,  where  the  fac¬ 
tual  properties  of  instances  are  restricted  by  the  corresponding  definitional  properties  of  their 
corresponding  class.  Another  structural  constraint  is  that  the  operations  and  properties  of  a 
class  must  be  specializations  of  corresponding  operations  and  properties  of  a  more  general 
class,  if  they  are  also  defined  on  this  class.  Behavioral  constraints  specify  that  the  effects  of  a 
particular  operation  on  a  class  must  be  preserved  by  the  corresponding  operations  on  speciali¬ 
zations  of  this  class. 

IS-A  relationships  can  be  defined  between  any  class  of  objects  modeled  in  INTERPRET, 
including  data,  transactions,  properties,  lexical,  grammatical,  transitions,  scripts  and  states 
classes.  For  example,  if  a  state  class  SI  is  declared  as  IS-A  another  state  class  S2,  then  any 
transition  valid  in  S2  will  also  apply  for  an  instance  of  SI. 

112.  Formalization 

The  INTERPRET  framework  has  been  formalized  in  [Pilote  83]  in  terms  of  INTER¬ 
PRET  constructs.  This  formalization  is  peculiar  in  that  it  lies  halfway  between  a  formaliza¬ 
tion  in  the  traditional  sense  and  an  implementation;  a  traditional  formalization  would  be  very 
precise  but  not  understandable  except  by  a  few  experts,  while  an  implementation  would  be 
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understandable  by  a  large  audience  but  very  lengthy  and  cumbersome.  The  formalization  of 
INTERPRET  is  instead  an  'implementation  on  paper”  using  itself  as  vehicle.  To  avoid  circu¬ 
larity  problems,  this  description  of  INTERPRET  is  boot-strapped  from  a  subset  of  TAXIS 
which  is  already  formalized  in  set  theoretical  terms  in  [Mylopoulos  and  Wong  80]. 

INTERPRET  has  been  rounded  off  by  using  it  extensively  on  many  examples  in  the 
'University  World',  on  the  description  of  the  syntax  and  semantics  of  an  English  grammar 
based  on  [Gazdar  82],  and  most  of  all  on  the  description  of  itself,  which  exercised  all  of  its 
features  and  lead  to  many  simplifications. 

II J.  Notation 

The  basic  uniformity  of  the  INTERPRET  framework  is  reflected  by  an  uniform  and  sim¬ 
ple  syntax.  All  objects  and  classes  of  objects  can  be  defined  using  the  following  basic  format: 

Class  Object  [  'with'  PropertyDefinition  'end'  ] 

using  a  BNF-like  notation  where  quoted  words  and  characters  are  constants  and  square  brack¬ 
ets  delimit  optional  constructs. 

IL3.1.  Clan  definitions 

To  define  a  class  object.  Class  must  be  an  existing  metaclass  identifier,  and  Object  is  a  yet 
unused  identifier,  which  becomes  the  name  of  the  class;  PropertyDefinition  specifies  a 
"definitional  property"  and  has  the  format 

PropertyCategory  (PropertyName  "f  ClassReference  * 

[  PropertyDefinition  ] 

where  '(...)  +'  indicates  one  or  more  repetitions  of  its  content.  PropertyCategory  stands  for  the 
name  of  a  class  of  properties  which  is  constrained  by  the  type  of  class  being  defined,  i.e.  by  the 
metaclass  value.  ClassReference  is  either  the  name  of  another  class  of  objects  or  the  explicit 
specification  of  an  unnamed  'simple  class*. 

Furthermore,  classes  can  be  defined  as  'specializations*  of  previously  defined  classes, 
using  the  format: 

MetaClassNamc  ClassName  'Isa'  ClassName  (','  ClassName)* 

In  such  cases,  all  properties  defined  on  the  most  general  class  are  considered  to  also  apply  to 
the  specialized  one. 

II.3.2.  Token  definitions 

Class  names  can  in  turn  be  used  to  define  tokens  as  follows: 

ClassName  TokenName  [  'with'  PropertyDefinition  ] 

where  each  PropertyDefinition  this  time  has  the  format 

PropertyName  '-  '  Value  [  ',"  PropertyDefinition  ] 

where  Value  designates  an  object  which  must  be  an  instance  of  the  class  corresponding  to  Pro¬ 
pertyName  in  the  definition  of  the  class  identified  by  ClassName. 

IUJ.  Property  access 

The  value  of  the  "factual  property"  of  an  object  can  be  accessed  using  the  format 

Token-or-Class  '.'  PropertyName 
and  the  range  of  a  "definitional  property"  is  given  by  the  format 

Class-or-MetaCIass  PropertyName 
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One  of  the  types  of  properties  that  can  be  defined  on  any  type  of  objects  is  conditions; 
such  properties  are  more  familiar  to  Programming  Language  persons  when  used  to  specify  a 
prerequisite  on  a  transaction,  but  which  can  also  be  used  to  restrict  the  possible  instances  of  a 
data  class,  much  like  in  Abstract  Data  Types  specifications. 

The  actions  used  in  this  paper  similarly  have  a  uniform  notation,  which  can  be  expressed 
by  the  general  format 

Action  [VariableNamc  "In"  |  ClassName 

['with'  PropertyReference  (',"  PropertyReference)*  ] 

where 

PropertyReference  :=  PropertyName  (*-  "  I  "=")  ObjectReference 
Action  :=  {’get-object’,  ’Insert-object’,  ’call’,  Instantiate’} 

When  an  ordering  is  specified  on  the  arguments  of  an  operation,  as  in  the  last  two  cases 
for  transactions  and  scripts,  the  familiar  'functional'  notation  can  be  used  as  an  abbreviation. 
This  means  for  example  that  if  A  takes  the  arguments  p  and  q,  then  the  expression  ’A  with  p 
-  a,  q  -  b’  can  be  written  A(a.  b).  An  argument  with  no  value  is  then  specified  by  leaving  its 
position  empty;  for  example,  A(  ,  b)  is  equivalent  to  the  expression  ’A  with  q  -  b\ 

in.  Representation  of  Syntactic  mechanisms 

A  distinctive  characteristic  of  INTERPRET  over  TAXIS  and  over  most  other  program¬ 
ming  languages  is  that  we  explicitly  model  all  syntactic  objects,  i.e.  there  is  an  internal  object 
for  each  printable  object  in  the  interface.  Syntactic  aspects  of  a  particular  application  can  be 
separated  into  lexical  and  grammatical  information.  In  this  paper,  lexical  and  grammatical 
class  names  are  prefixed  with  the  symbols  ’%’  and  ’&’  respectively.  This  is  only  a  convention 
making  it  easier  to  recognize  the  names  of  syntactic  classes. 

in.l.  Lexical  Information 

Lexical  objects  are  much  like  identifiers  in  a  symbol  table;  the  most  important  property 
of  a  lexical  object  is  of  having  a  'printable  form*,  i.e.  a  particular  string  of  characters  which 
can  be  used  to  display  the  lexical  object  or  recognize  it  in  an  input  expression.  An  other 
important  property  of  many  lexical  objects  is  a  connection  with  a  semantic  object  which  it 
represents.  We  explicitly  model  this  connection  with  the  property  den.  The  value  of  the  den 
property  of  a  lexical  object  can  then  be  explicitly  assigned  or  modified  by  a  programmer,  pro¬ 
viding  complete  control  in  particular  over  symbol  table  information. 

Many  syntactic  objects  can  be  defined  as  denoting  the  same  semantic  object  by  using  the 
relation  trans  between  them,  specifying  that  one  is  a  translation  of  the  other.  For  example, 
given  the  lexical  token  ’John  Smith’,  the  identifier  ’John’,  the  token  John,  and  the  following 
factual  properties: 

’John  Smith’.trans  =  ’John’ 

’John’.den  =  John 

we  can  derive  the  following  equalities: 

’John  Smith’.den 

=  ’John  Smith’.trans.den 
=  ’John’.den 
=  John 

although  we  gave  no  direct  representation  of  the  factual  property 

’John  Smith’.den  =  John 

In  summary,  the  essence  of  our  approach  to  syntactic  customization  is  to  introduce  intermedi¬ 
ate  translations  leading  to  some  denotation  predefined  by  a  programmer. 
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IU.2.  Grammatical  Information 

Grammatical  rules  are  also  modeled  by  internal  objects  in  INTERPRET.  Grammar  rules 
typically  translate  an  expression  into  another  one,  based  on  a  "pattern"  which  decomposes  the 
first  expression,  and  another  pattern  which  rebuilds  a  new  expression  from  the  identified  com¬ 
ponents.  A  grammar  rule  is  then  modeled  as  a  property  trana  defined  on  a  class  of  INTER¬ 
PRET  syntactic  objects  called  patterns;  patterns  are  therefore  used  both  as  domain  and  range 
of  a  trans  property. 

A  pattern,  or  grammatical  class,  is  a  class  of  lexical  expressions,  each  one  of  which  in  turn 
is  a  sequence  of  lexical  tokens,  or  smaller  lexical  expressions.  If  a  pattern  must  "match*  a  lexi¬ 
cal  expression,  then  this  expression  is  said  to  be  an  instance  of  the  grammatical  class  defined 
by  this  pattern.  A  pattern  is  written  as  a  sequence  of  syntactic  objects,  between  double 
quotes,  where  syntactic  objects  are  either  lexical  tokens,  (names  of)  lexical  classes  or  (names 
of)  grammatical  classes.  In  the  last  two  cases,  the  syntactic  object  is  delimited  by  angular 
brackets. 

A  grammatical  class  is  allowed  to  have  "local  variables",  as  is  done  in  SNOBOL4,  to  col¬ 
lect  fragments  of  a  lexical  expression,  whenever  they  match  components  of  a  pattern.  The 
range  of  these  variables  is  then  specified  in  other  properties  of  the  grammatical  class,  such  as: 

G-CLASS  "What  <is-arc>  the  <attr-set>  of  <pcrson>  T  with 
locals 

is-are:  {’is’,  ’are’}; 
attr-set:  &ATTR-SET; 
person:  %PERSON; 

end 

This  format  is  similar  to  the  usual  syntax  of  TAXIS  and  INTERPRET  constructs,  and  is  quite 
useful  to  assign  other  properties  of  grammatical  classes. 

IIU.  Pattern  matching 

To  recognize  an  input  expression,  represented  as  a  string  of  characters  (delimited  by  sin¬ 
gle  quotes  in  our  notation),  the  following  operations  must  take  place.  First,  the  string  must  be 
broken  down  into  meaningful  syntactic  objects;  this  phase  is  traditionally  called  "scanning*  in 
software  technology.  Then  the  resulting  sequence  of  syntactic  objects  must  be  matched  with 
available  grammar  rules;  this  process  is  correspondingly  called  "parsing*. 

Note  that  we  do  not  commit  ourselves  to  any  parsing  strategy  here.  What  is  important 
for  our  purpose  is  to  describe  the  relationships  that  exist  between  lexical  expressions  and 
grammatical  classes. 

The  distinction  between  lexical  tokens  and  lexical  expressions  is  quite  arbitrary  and 
depends  on  a  particular  application.  In  general  lexical  tokens  will  contain  the  representation 
of  "words",  or  "identifiers",  while  lexical  expressions  will  contain  sequences  of  such  words.  The 
relationship  between  these  three  classes  is  as  follows: 


STATE  with 
input:  %CHARACTER* 

scan 

STATE  with 
input:  %L-TOKEN* 

parse 

STATE  with 
input:  &L-EXPRESSION 

where  the  suffix  "•"  indicates  an  arbitrary  repetition  of  its  argument.  For  simplicity  of  exposi¬ 
tion,  we  will  use  sequences  of  characters  between  single  quotes  as  the  same  notation  for  both 
string  and  corresponding  "analyzed  expressions",  but  the  reader  should  keep  in  mind  the 
intermediary  steps. 
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III. 4.  Language  generation 

An  important  use  of  patterns  in  INTERPRET,  besides  matching  object  descriptions,  is  to 
generate  syntactic  representations  for  INTERPRET  objects  in  general.  The  INTERPRET  con¬ 
struct  that  produces  the  printable  description  of  an  object  is  a  pattern  where  object  references 
are  delimited  by  the  angular  brackets  in  the  pattern.  For  example,  the  number  22  can  be 
transformed  into  the  string  ’22’  by  writing  the  expression  '<  22  >".  Similarly,  an  expression 
like  '2  +  <  22>  '  will  result  in  the  lexical  expression  7  +  22’.  In  more  detail,  what  is  happen¬ 
ing  is  that  we  have  a  pattern  &P  :=  '2  +  <x:  INTEGER>",  which  stands  for  a  class  of  lexical 
expressions  depending  on  the  value  of  x;  when  we  give  a  value  to  this  variable  x,  e.g.  by  writ¬ 
ing  &P  with  x  -  22,  we  obtain  an  instance  of  &P,  which  is  noted  7  +  22’. 

ini.  Semantic-syntactic  Interaction 

In  the  fame  way  as  for  lexical  information,  the  construct  that  relates  an  expression  in  the 
INTERPRET  language  to  the  object  that  it  represents  is  the  function  den.  When  applied  to  a 
valid  INTERPRET  expression,  this  function  is  very  similar  to  LISP  function  EVAL. 

As  an  example  of  this  last  mechanism,  we  have  that,  if  x  -  '3  +  4\  then 

x'  =  xxlen 
=  ’3  +  4'jden 
=  7 

These  two  mechanisms  can  be  combined  at  will,  and  even  arbitrarily  nested.  For  example: 

'2  +  <  7  +  4’.den  >'  =  '2  +  <  7  >' 

=  7+7’ 


Note  the  distinction  here  between  lexical  expressions  (delimited  by  single  quotes)  and  a  pat¬ 
tern  (between  double  quotes).  It  is  useful  to  recall  here  that,  while  '2  +  <x:  INTEGER>"  is 
a  pattern,  the  expression  "2  +  <  7> "  stands  for  an  instance  of  this  pattern. 

IV.  Example 

IV.l.  Description  of  the  application 

A  user  interface  can  be  defined  as  any  mechanism  used  to  mediate  between  users  and 
applications.  These  include  in  the  simplest  case  all  Input/Output  formatting,  up  to  the  most 
complex  cases  of  so-called  'natural  language*  front-ends  to  data  management  systems. 

In  this  range  of  interfaces,  going  from  rigid  input  formats  and  command  languages  to 
systems  trying  to  make  sense  of  arbitrary  English  expressions,  the  latter  obviously  presents  a 
more  difficult  problem.  But  any  solution  to  this  larger  problem  can  also  be  applied  to 
represent  simpler  mechanisms.  The  following  discussion  will  therefore  concentrate  on  the 
problems  and  issues  involved  in  building  the  more  sophisticated  types  of  user  interfaces,  with 
the  understanding  that  these  solutions  also  apply  to  simpler  cases.  We  will,  however,  restrict 
our  attention  to  natural  language  systems  used  in  practical  situations  today,  typified  by  LIFER 
[Hendrix  et  al.  78],  as  opposed  to  the  more  advanced  ones  still  under  research. 

The  following  section  presents  a  highly  simplified  example  of  the  kinds  of  information 
that  must  be  incorporated  in  a  user  interface  to  make  it  truly  'flexible*.  Although  restricted, 
this  example  also  illustrates  the  complexity  of  the  phenomena  to  handle,  most  of  which  are 
often  only  skimmed  in  many  interfaces  aiming  at  'user-friendliness*. 

As  an  example  of  the  various  mechanisms  that  are  required  to  support  user  interfaces, 
we  present  the  following  dialogue  fragment,  where  the  user  enrolls  a  student  in  a  particular 
course: 

>  Enrol  John  Smith. 

System:  In  which  course? 
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>  CSC148 

System:  OK.  John  Smith  enrolled  in  CSC148. 

The  particular  user  interface  to  be  described  supporting  the  above  exchange  will  also 
accept  expressions  such  as  the  following  as  also  triggering  a  "student  enrollment"  event: 

Enrol  Enrol  In  CSC  148 

Register  John  Smith  Enrol  John  Smith  in  CSC  148 
Put  in  CSC  148  Place  John  Smith  in  CSC  148 

with  a  few  additional  variations  using  different  verbs. 

This  example  of  enrolling  John  Smith  does  not  at  first  seem  very  difficult  or  worthy  of  so 
much  attention:  prompting  for  a  course  and  remembering  the  student's  name  can  be  done  ad 
hoc.  What  is  important  is  that  it  is  difficult  to  provide  these  features  universally  and  flexibly 
throughout  the  system.  This  example  illustrates  the  importance  of  generality  and  intelligence 
in  user  interfaces,  for  which  each  of  the  three  aspects  that  we  distinguished  in  a  user  interface 
must  be  handled. 

The  most  interesting  feature  of  this  example  is  that  it  illustrates  how  a  same  "semantic" 
action  can  be  triggered  by  various  combinations  of  "syntactic"  and  "pragmatic"  information.  In 
particular,  the  example  shows  how  a  user  can  specify  too  much  or  too  little  information,  with 
respect  to  a  particular  system  request  at  a  given  time.  In  the  first  case,  as  in  the  reference  to  a 
course  in  our  example,  the  system  must  engage  in  a  clarification  dialogue  with  the  user,  where 
the  user  answers  arc  most  likely  to  be  very  cryptic,  and  must  be  understood  in  the  light  of  the 
expected  information.  In  the  second  case,  we  want  the  system  to  retain  the  extra  information 
and  use  it  whenever  it  applies  for  a  subsequent  request;  for  example,  the  reference  to  a  stu¬ 
dent  in  the  first  sentence  of  our  example  is  kept  and  used  when  necessary  instead  of  querying 
the  user. 

IV.2.  Semantic  aspects 

To  achieve  the  above  effect,  we  need  to  model  the  various  objects  and  events  involved. 
Modeled  objects  and  events  are  represented  respectively  by  data  and  transaction  objects;  these 
"internal"  objects,  in  the  sense  of  [Abrial  74],  are  "instances'  of  the  INTERPRET  "classes" 
which  are  defined  in  Appendix  1. 

The  transaction  class  ENROL  defines  a  class  of  programs  as  follows.  Each  instance  of 
ENROL  represents  a  function  taking  a  student  and  a  course  as  arguments  and  recording  if  pos¬ 
sible  the  'enrollment'  of  the  student  in  the  course.  A  student  is  represented  by  an  instance  s 
of  the  data  class  STUDENT,  and  a  course  is  represented  by  an  instance  c  of  COURSE.  An 
enrollment  is  represented  by  adding  one  to  the  value  of  the  size  property  of  the  course  c,  and 
adding  c  to  the  list  courses  defined  as  a  property  of  the  student  s. 

Data  objects  can  be  referenced  and  modified  by  INTERPRET  expressions  like: 

get-object  john  In  PERSON  with  name  =  'John  Smith’;  (al) 

Insert-object  john  In  STUDENT;  (a2) 

get-object  esc  148  In  COURSE  with  (a3) 

course#  =  148,  dept  =  'Computer  Science’; 

An  expression  like  'get-objet  variable-name  In  CLASS  with  property  =  value ’  can  be  viewed  (in 
terms  more  familiar  to  many  readers)  as  a  database  "query",  and  an  expression  like  Insert- 
object  variable-name  In  CLASS'  as  an  "update"  on  a  file,  relation,  or  the  like.  After  the  above 
assignment  of  values  to  the  variables  John  and  cscl48,  an  instance  of  the  ENROL  transaction 
can  be  "called"  with  the  expression: 
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call  ENROL(john,  cscl48); 


(a4) 


IV 3.  Dialogue  aspects 

A  user  could  obtain  essentially  the  same  result  as  in  the  above  semantic  component  more 
simply  by  allowing  the  program  to  guide  the  interaction  and  explicitly  ask  for  needed  informa¬ 
tion. 

To  enrol  a  student  in  a  course,  using  the  transaction  ENROL  introduced  above,  the  vari¬ 
ous  sequences  of  events  that  we  want  to  represent  can  be  informally  expressed  as  follows: 

1.  (tart 

2.  at  any  time,  cancel  (and  then  terminate  at  step  3),  or 


2.1  in  any  order 

2.1.1  get  student  argument 
2.1 2  get  course  argument 

22  call  ENROL; 

if  there  is  an  error,  then  get  another  course  argument, 
and  repeat  22 

else  if  OK,  then  terminate  (at  step  3) 

3.  end 

INTERPRET  'scripts*  are  used  to  specify  the  allowable  successions  of  input/output 
interactions  between  the  user  and  the  program;  in  this  case,  an  instance  of  ENROL-SCRIPT, 
defined  in  Appendix  2  and  graphically  represented  in  Figure  1,  performs  this  role  and  also 
provides  an  interface  between  the  ENROL  program  and  its  users,  gathering  appropriate  argu¬ 
ments  and  calling  the  program  as  in  (a4)  above. 

For  example  an  expression  like 

Instantiate  ENROL-SCRIPT(userl,  john); 

will  define  a  new  instance  of  ENROL-SCRIPT  which  will  remain  active  as  long  as  the  enroll¬ 
ment  of  'john*  is  not  completed  or  canceled. 


An  instance  of  ENROL-SCRIPT  represents  a  function  from  a  state  which  is  named 
’start’,  to  produce  a  new  state  named  ’end’.  This  script  also  takes  as  another  argument  an 
instance  of  TERMINAL-CODE,  which  identifies  an  external  entity  (user  or  another  script) 
which  will  send  messages  to  the  script  and  which  will  receive  its  replies.  This  script  has  three 
properties,  which  can  either  be  assigned  values  directly  when  the  script  is  instantiated,  or  for 
which  the  script  will  explicitly  request  values.  As  in  Petri  nets,  any  successful  transition  deac¬ 
tivates  its  argument  state(s)  and  activates  its  target  state(s).  As  soon  as  one  transition 
activates  end,  the  script  terminates. 

An  instance  of  ENROL-SCRIPT  is  initially  in  the  state  named  ’start’.  Only  one  transi¬ 
tion,  Initialize,  is  defined  on  this  state;  since  it  has  no  conditions  and  no  actions  properties,  its 
only  effect  is  to  disable  start  and  activate  concurrently  the  states  waitStudent,  waitCourse.  and 
waitCancel.  These  activations  in  turn  "enable*  the  transitions  GetStudent,  GetCourse,  and 
GetCancel,  which  may  'fire'  if  their  actions  property  succeeds;  any  of  these  transitions  can  fire 
independently  of  the  two  others,  since  their  arguments  states  are  distinct.  In  particular,  at 
any  point  during  the  remaining  life  of  the  script,  the  transition  GetCourse  may  succeed  by 
receiving  the  message  ’Bye’,  and  then  terminate  the  script. 
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Fig.  1  -  ENROL-SCRIPT 
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In  our  example,  the  script  ENROL-SCRIPT  was  "instantiated”  with  a  value  assigned  to 
its  student  property.  The  transition  GetStudent  therefore  immediately  succeeds  and  activates 
the  state  gotStudent.  The  transition  GetCourse,  on  the  other  hand,  finds  no  value  for  the 
course  property  and  must  request  a  reference  to  a  course  by  a  take  command;  this  command 
in  this  case  sends  to  user 1  the  message  ’In  which  course?’. 


A  request  for  the  value  of  a  property  uses  the  'pattern*  attached  to  the  format  property 
of  the  first  property  to  generate  an  expression  to  be  sent  to  userl.  The  reply  has  to  match  the 
pattern  attached  to  the  grammar  property  to  be  recognized  and  accepted.  A  grammar  pattern 
will  decompose  a  successfully  matched  input  expression,  and  produce  a  "denotation”,  ix.  a 
reference  to  some  "internal*  object(s).  The  denotation  of  an  expression  matched  by  a  pattern 
is  given  by  the  value  of  the  property  den  applied  on  this  expression. 

The  user’s  response  ’CSC  148’  is  then  matched  by  the  particular  mini-grammar  (["in"] 
%COURSE)  and  is  recognized  as  denoting  the  course  cscl48.  When  both  gotStudent  and 
gotCourse  are  active,  the  transition  CallEnrol  is  enabled  and  tries  to  evaluate  the  transaction 
ENROL. 


The  transaction  ENROL  may  generate  some  exceptions,  which  are  defined  as  exceptional 
states  in  the  script,  and  on  which  exception  handlers  can  be  specified.  In  our  example,  the  state 
waitCourse  is  then  reactivated  and  the  user  is  queried  again  for  a  value  of  the  variable  course. 
The  action  ask  which  applies  for  both  types  of  exceptions  raised  by  ENROL  is  defined  on  a 
transition  which  applies  on  a  "generalization"  of  these  two  types  of  exceptions.  Actions  which 
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are  specific  to  each  particular  exception  are  then  defined  on  specializations  of  this  transition. 
For  our  example,  in  case  of  a  classFull  exception,  the  messages  ’ Please  select  another  one ’  and 
’ the  class  CSC148  is  full’  would  both  be  sent  to  userl.  Finally,  when  ENROL  succeeds,  the 
message  ’OK.  John  Smith  enrolled  in  CSC148  ’  is  sent  to  userl. 

IV.4.  Syntactic  aspects 

As  already  introduced  in  the  discussion  of  scripts,  any  communication  with  a  user  occurs 
in  our  application  through  the  use  of  "syntactic"  objects:  symbols,  words  and  complex  expres¬ 
sions.  To  support  the  above  example,  we  need  to  define  at  least  the  lexical  and  grammatical 
information  shown  in  Appendix  3.  We  also  need  to  add  the  following  lexical  information: 

Insert-object  ’John  Smith’  in  %STUDENT  with  trans  -  ’john’;  (cl) 

Insert-object  ’CSC148’  In  %COURSE  with  trans  -  ’cscl48’;  (c2) 


Th?  &FNROL  grammatical  class  and  its  specializations,  as  defined  in  Appendix  3,  have 
the  effect  c<  identifying,  in  an  input  expression,  references  to  the  arguments  of  the  ENROL 
transaction.  It  is  then  up  to  the  ENROL-SCRIPT  instance  to  check  whether  it  has  received 
all  the  elements  of  information  it  needs  and  whether  these  elements  are  sufficiently  complete. 
In  the  previous  section  of  our  example,  we  discussed  how  the  course  argument  is  requested 
from  a  user  to  complete  the  enrollment. 

To  simplify  the  example,  we  will  assume  that  any  translation  of  an  input  expression  into 
an  expression  of  the  INTERPRET  language  is  of  the  form: 

'Instantiate  ENROL-SCRIPT(  <  S  > ,  <  C  >  )' 

where  S  is  either  the  empty  string  or  an  instance  of  the  lexical  class  %STUDENT,  and  simi¬ 
larly,  C  is  either  empty  or  an  instance  of  %COURSE.  For  example,  the  desired  translation  of 
the  input  expression  ’ Enrol  John  Smith ’  is  ’Instantiate  ENROL-SCRIPT(john,  )’,  which  in  turn 
will  evaluate  to  the  script  instance  described  in  the  previous  section. 

Our  example  involves  four  combinations  of  arguments,  depending  on  whether  the  stu¬ 
dent  and  course  arguments  of  ENROL-SCRIPT  are  given  values.  In  addition,  some  variation 
is  allowed  in  the  range  of  verbs  triggering  ENROL-SCRIPT,  depending  on  the  arguments 
specified  into  an  input  expression.  For  example,  only  the  verbs  ’enrol’  and  ’ register ’  are  recog¬ 
nized,  alone  or  followed  by  a  student  reference,  as  triggering  ENROL-SCRIPT;  however,  if  a 
course  reference  is  specified,  with  or  without  a  student  reference,  we  also  allow  the  other 
instances  of  the  lexical  class  %ENROL. 

The  various  patterns  required  to  handle  these  situations  can  be  classified  as  follows: 

&ENROL 

!  ^ 

I  ({’enrol’,  ’register’}  [student:  %STUDENTJ) 

(%ENROL  [student:  %STUDENT}  "in'  course:  %COURSE) 


where  double  arrows  indicate  an  IS-A  relationship  between  grammatical  classes.  The 
corresponding  translations  are  attached  to  these  grammatical  classes,  introducing  some  more 
special  cases  as  needed,  as  described  in  Appendix  3.  The  IS-A  hierarchy  of  the  various  spe¬ 
cializations  of  &ENROL  is  described  in  more  detail  in  Figure  2. 

As  an  example  of  translating  an  input  expression,  ’ Enrol  John  Smith ’  matches  the 
&ENROL-VERB  specialization  of  &ENROL,  and  in  turn  the  &ENROL-STUDENT 
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specialization  of  &ENROL-VERB;  it  is  then  analyzed  as  a  lexical  expression  whose  student 
property  has  'John  Smith ’  for  value;  the  s  property  of  this  lexical  expression  has  then  the  value 

'Instantiate  ENROL-SCRIPT(<  student  .trans> ,  )' 

=  ’Instantiate  EN ROL-SCRIPT( john ,  )’ 


IVi.  Another  variation 

The  course  argument  of  ENROL  could  also  be  mentioned  in  the  same  input  sentence  as 
the  student  argument.  For  example,  the  input  sentence 

Put  John  Smith  in  CSC  148. 

is  matched  by  the  &ENROL-COURSE  specialization  of  &ENROL,  then  by  &ENROL- 
STUDENT-COURSE,  and  is  finally  translated  into  the  following  INTERPRET  expression: 

’Instantiate  ENROL-SCRIPT(userl,  john,  cscl48)’; 

As  shown  in  the  definition  of  ENROL-SCRIPT,  the  resulting  script  instance  has  then  all  the 
argument  values  that  it  needs  to  directly  evaluate  ENROL,  and  will  then  terminate,  unless 
some  exception  is  raised  by  ENROL. 

v.  Conclusions 

The  most  important  contribution  that  INTERPRET  brings  to  Programming  Language 
(PL)  research  is  the  separation  of  semantic  and  syntactic  aspects,  and  in  turn,  the  modeling  of 
pragmatic  aspects.  The  distinction  between  syntax  and  semantics  as  a  practical  tool  in  PL  is  as 
far  as  we  know  novel  and  unique,  although  it  is  commonly  used  in  the  formal  description  of 
programming  languages,  and  of  languages  in  general  in  philosophy  and  linguistics.  The  most 
interesting  consequences  of  this  distinction  are  that  aspects  of  an  application  other  than  data 
and  programs,  previously  buried  as  the  'environment'  of  an  application,  can  now  be  explicitly 
examined,  manipulated,  and  even  modified.  This  result  satisfies  a  requirement  identified  in 
[Hendrix  et  al.  79]  for  users  to  access  not  only  data  but  to  examine  their  environment  as  well. 

Much  remains  to  be  investigated  in  this  direction  but  INTERPRET  provides  the  most 
organized  basis  to  pursue  safely  such  research  directions. 

Another  major  contribution  of  INTERPRET  to  PL  is  the  uniform  framework,  adapted 
from  the  TAXIS  framework,  based  on  a  very  small  number  of  concepts  and  used  to  describe 
all  aspects  of  user  interfaces.  The  basic  simplicity  of  the  resulting  language  is  a  goal  that  is 
widely  recognized.  Yet  the  abstraction  mechanisms  integrated  in  the  language,  also  borrowed 
from  TAXIS,  allows  to  combine  these  simple  constructs  to  quickly  represent  very  powerful 
constructs  and  mechanisms,  best  adapted  to  each  type  of  applications.  The  organizational 
power  of  this  tool  has  apparently  no  equivalent  in  the  PL  community. 

As  described  in  [Pilote  83],  a  user  interface  can  be  used  to  access  its  own  underlying 
information,  allowing  a  user  to  understand  the  features  and  limitations  of  the  interface  and 
also,  most  importantly,  to  modify  its  capabilities.  A  challenging  application  of  this  principle  is 
that  a  programming  language  itself  can  be  described  in  this  way,  providing  explicit  access  to 
identifiers  defined  in  an  application,  but  also  to  the  various  constructs  of  the  language  used  to 
trigger  semantic  effects.  Providing  access  to  these  constructs  means  that  these  can  be  exam¬ 
ined,  manipulated,  and  possibly  changed  by  a  programmer.  In  the  same  way  as  it  was  found 
beneficial  to  allow  end  users  to  modify  and  customize  themselves  the  formats  of  their  interac¬ 
tion  with  an  application  [Hendrix  et  al.  79],  it  is  now  possible  for  a  programmer  to  define  his 
own  programming  language  constructs  (in  the  same  way  also  as  a  mathematician  often 


(1)  Thi*  is  probably  an  important  part  of  the  success  enjoyed  by  interactive  languages  such  as  A  PL, 
which  do  provide  significant  access  to  the  environment  through  a  number  ot  'system  variables  and  func¬ 
tions*. 
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introduces  his  own  notation  to  best  express  some  concepts).  The  framework  inherited  from 
TAXIS  insures  that  no  previously  defined  construct  can  be  overridden  or  contradicted,  but 
only  extended  or  refined  to  better  suit  particular  situations. 

Finally,  we  would  like  to  point  out  a  very  promising  use  of  a  language  like  INTER¬ 
PRET,  as  primarily  a  design  language  as  opposed  to  an  implementation  language.  INTER¬ 
PRET  is  ideally  suited  to  act  as  basis  for  an  abstract  model  of  an  application,  where  all  aspects 
and  elements  of  the  application  can  be  manipulated  at  an  high  level  of  abstraction,  until  a 
satisfactory  equilibrium  is  reached.  A  detailed  implementation  can  then  be  produced,  using 
possibly  a  more  traditional  approach  and  language,  guided  by  the  more  abstract  architecture. 
Some  implementations  of  this  approach  are  already  used  in  practice,  such  as  [Bassett  and  Gib- 
Ion  83].  In  this  direction  probably  lies  the  largest  and  most  interesting  potential  of  our 
approach  for  practical  uses  of  PL  techniques. 

VI.  Acknowledgements 

The  author  wishes  to  thank  Dr.  John  Mylopoulos  for  providing  the  framework  for  this 
research  and  numerous  discussions  that  led  to  the  results  presented  in  this  paper.  We  also 
thank  Paul  Bassett,  of  Netron  Inc.,  Toronto,  for  private  communications  that  highlighted  the 
potential  of  a  design  level  for  practical  applications.  Many  thanks  finally  to  Sol  Greenspan 
and  Brian  Nixon  for  reviewing  and  commenting  on  the  'Example”  section. 

vn.  Bibliography 

[Abrial,  J.R.,  74] 

Data  Semantics,  Data  Management  Systems,  Klimbie  and  Koffeman  (eds.),  North-Holland. 
[Barron,  J.L.  81] 

Dialogue  and  Process  Design  for  Interactive  Information  Systems  using  TAXIS,  Tech.  Rep. 
CSRG-128,  Univ.  of  Toronto  (April  81). 

[Bassett,  P.,  and  Giblon,  J.,  83] 

Computer  Aided  Programming  (Part  I),  in  SOFTFAIR’83:  Software  Development  Tools, 
Techniques  and  Alternatives,  (ACM,  IEEE  Comp.  Soc.,  Nat.  Bur.  of  Standards),  Wash¬ 
ington  DC,  July  25-28;  also  available  as  Internal  Memo,  Netron  Inc.,  Toronto,  Feb.  1983 

[Borgida,  A.,  Mylopoulos,  J.  and  Wong,  H.K.T.,  82] 

Methodological  and  Computer  Aids  for  Interactive  Information  System  Design,  in  Automated 
Tools  for  Information  System  Design,  Schneider,  H.-J.  and  Wasserman,  A.  (editors),  IFIP, 
North-Holland. 

[Borgida,  A.,  Mylopoulos,  J.  and  Wong,  H.K.T.,  83] 

Taxonomic  Software  Specifications,  in  Perspectives  on  Conceptual  Modelling,  Brodie,  M., 
Mylopoulos,  J.  and  Schmidt  (eds.),  Springer-Verlag. 

[Gazdar,  G.,  82] 

Phrase  Structure  Grammar,  in  The  Nature  of  Syntactic  Representation,  Jacobson,  P.,  Pul- 

Ium,  GJC.  (Eds.),  Dordrecht,  D.  Reidel. 

[Greenspan,  S  J.,  Mylopoulos,  J.  and  Borgida,  A.,  82] 

Capturing  More  World  Knowledge  in  the  requirements  Specification,  Proc.  6th  Intern.  Conf. 
on  Software  Engineering,  Tokyo,  Japan,  1982. 

[Hendrix,  G.G.,et  al.  78] 

Developing  a  Natural  Interface  to  Complex  Data,  ACM  Transactions  on  Database  Systems, 
Vol3,  No 2,  (June  1978),  105-147. 


165 


[Hendrix,  G.G.,  Sagalowicz,  and  Sacerdoti,  E.D.,  79] 

Research  on  Transportable  English-Access  Media  to  Distributed  and  Local  Data  Bases,  Pro¬ 
posal  for  Research  to  DARPA  No.ECU  79-103,  (Nov.  1979),  SRI  International. 

[Hoare,  CA.R.,  78] 

Communicating  Sequential  Processes,  Comm.  ACM,  Vol.21,  No.8,  Aug.1978,  pp.666-677. 
[Lientz,  B.P.  and  Swanson,  EJ3.,  81] 

Problems  in  Application  Software  Maintenance,  Comm,  of  the  ACM,  24,  11  (Nov.  1981), 
pp.763-769. 

[Morris,  C.,  38] 

Foundations  of  the  Theory  of  Signs,  in  International  Encyclopedia  of  Unified  Science, 
Vol.l,  No.2,  Univ.  of  Chicago  Press. 

[Mylopoulos,  J.,  Bernstein,  P,  and  Wong,  HX.T.,  80] 

A  Language  Facility  for  the  Design  of  Interactive  Database-Intensive  Applications,  Trans,  on 
Database  Systems,  Vol.5,  No.2,  June  1980,  pp.185-207;  also  Tech.Rep.  CSRG-105,  Univ.  of 
Toronto,  July  1979. 

[Mylopoulos,  J.  and  Wong,  HJC.T.,  80] 

Some  Features  of  the  TAXIS  Data  Model,  Proc.  Very  Large  Data  Base  Conference,  Mont¬ 
real,  Canada. 

[Pilote,  M.  83a] 

A  Framework  for  the  Design  of  Linguistic  User  Interfaces,  PhD.  Thesis,  Dept.  Computer 
Science,  Univ.  of  Toronto,  CSRG  Technical  Note  #32,  Jan.  1983. 

[Stoy,  J.,  77] 

Denotatlonal  Semantics:  The  Scott- Strachey  Approach  to  Programming  Language  Theory, 
MIT  Press. 

[Tsichritsis,  D.  &  Lochovsky,  F.,  81] 

Data  Models,  Prentice-Hall,  1981. 

[Zisman,  MD.,  77] 

Representation,  Specification,  and  Automation  of  Office  Procedures,  PhD.  Thesis,  Dept,  of 
Decision  Science,  The  Wharton  School,  Univ.  of  Penn.,  Sept.  77. 


-  166  - 


Appendix. 1  -  Data  and  Program  Definition 

DATA-CLASS  PERSON  with 
properties 

name:  STRING;  end 

DATA-CLASS  STUDENT  tsa  PERSON  with 
properties 

courses:  setof  COURSE;  end 

DATA-CLASS  COURSE  with 
properties 

course#:  {100.2999}; 
dept:  STRING; 

size,  max-size:  NATURAL-NUMBER; 

end 

TRANSACTION-CLASS  ENROL  with 
parameters 

enrol:  (student,  course); 

locals 

student:  STUDENT; 
course:  COURSE; 

conditions 

enough-place?:  course  .size  <  course.max-size 

exc  classFuIl; 

not-already-taking?:  not  (course  in  student  .courses) 

exc  alreadyTaking; 

actions 

add-course:  student  .courses  <  -  course; 
increase-course-size:  course^ize  <-  course  .size  +  1; 

end 


Notation 

-  the  class  PERSON  is  an  instance  of  the  metaclass  DATA-CLASS; 

-  the  property  'name'  is  an  instance  of  the  property  category  properties; 

•  'name* *  takes  an  instance  of  PERSON  as  argument  to  produce  a  string; 

-  the  class  STUDENT  is  a  subclass  of  PERSON; 

-  the  property  courses  is  multi-valued; 

-  {100.2999}  is  a  range  class  of  all  integers  between  100  and  2999; 

-  the  value  of  both  properties  size  and  max-size  is  a  natural  number; 

-  an  instance  of  ENROL  is  uniquely  identified  by  a  student  and  a  course; 

-  ’enough-place?’  is  the  name  of  a  condition  which  must  be  satisfied; 

-  ’class-full’  identifies  an  exception  which  is  raised  if  enough-place?  is  false; 

-  the  value  of  course  is  added  to  the  value  of  courses  on  student; 

-  the  value  of  size  on  course  is  incremented  by  1. 
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Appendix.2  -  Student  Enrollment  script 

SCRIPT-CLASS  ENROL-SCRIPT:  (start,  user  -  end)  with 
locals 

user:  TERMINAL-CODE; 
student:  STUDENT  with  locals 

format:  'Which  student?*,  grammar:  %STUDENT;  end; 
course:  COURSE  with  locals 

format:  "In  which  course?",  grammar:  (["in"]  %COURSE);  end; 
cancel:  {true}  with  locals 

format:  NONE,  grammar:  "Bye"  with  den  default  true;  end; 

states 

start,  waitStudent,  waitCourse,  waitCancel, 

gotStudent,  gotCourse,  end:  STATE; 

wrongCourse:  EXCEPTION-STATE; 
classFull,  alreadyTaking  Isa  wrongCourse; 

transitions 

Initialize:  (start  -  waitStudent,  waitCourse,  waitCancel); 

GetStudent:  (waitStudent  -  gotStudent)  with 
actions 

If  student  =  nothing 

then  ask- which-studcnt :  take(user,  student); 

end 

GetCourse:  (waitCourse  -  gotCourse)  with 
actions 

If  course  =  nothing 

then  ask-which-course:  take(user,  course); 
end 

Cancel:  (waitCancel  -  end)  with 
actions 

ready-for-cancel:  take(user,  cancel); 
end 

CallEnrol:  (gotStudent,  gotCourse  -  end)  with 
actions 

try-to-enrol:  ENROL(student,  course) 

exchandler  for  wrongCourse,  gotStudent  Is  waitClass  with 
actions 

reset:  course  -  nothing; 

ask:  gtve(user,  "Please  select  another  one."); 

end; 

exchandler  for  classFull  with 

actions  explain:  gtve(user,  'the  class  <course>  is  full.*); 

end; 

exchandler  for  alreadyTaking  with 
actions  explain:  glve(user, 

'<  student>  is  already  taking  <  course>  .'); 

end; 

feedback:  glve(user,  'OK.  <student>  enrolled  in  <course>  ."); 

end 

end  ENROL-SCRIPT 
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Apjpendix.3  -  Lexical  and  Grammatical  Classes 


G- CLASS  &ENROL  with 

locals 


s,  c:  ^EXPRESSION  default  ’nothing’; 

trans:  "Instantiate  ENROL-SCRIPT(userl,  <  s  > ,  <  c  >  )'; 


end 


GCLASS  &ENROL-VERB  Isa  &ENROL 

:=  (  {’enrol’,  ’register’}  [student:  %STUDENT]  ); 


G-CLASS  &ENROL-STUDENT  Isa  &ENROL  with 


locals 

student:  %STUDENT  total; 
s  default  student.trans; 


end 


G-CLASS  &ENROL-COURSE  Isa  &ENROL 

:=  (  %ENROL  [student:  %STUDENT]  'in'  course:  %COURSE  )  with 


locals 


c  default  course.trans; 
end 


G-CLASS  &ENROL-VERB-STUDENT  Isa  &EN ROL- VERB ,  &ENROL-STUDENT; 

G  CLASS  &ENROL-STUDENT-COURSE  Isa  &ENROL-STUDENT,  &ENROL-COURSE; 

L-CLASS  %ENROL  :=  {’enrol’,  ’register’,  ’put’,  ’take’,  ’move’,  ’place’,  ’try’} 

L-CLASS  %PERSON  with  locals  den:  PERSON;  end 

L-CLASS  %STUDENT  Isa  %PERSON  with  locals  den:  STUDENT;  end; 

L-CLASS  %COURSE  with  locals  den:  COURSE;  end 

Fl|i  -  Grammatical  Classes 


&ENROL-VERB  AENROI -STUDENT  AENROL-COURSE 


&ENROL-VERB-STUDENT  &ENROL-STUDENT-COURSE 


-  169  - 


A  Data  Modeling  Approach 
to  Simplify  the  Design  of  User  Interfaces 

Michel  Piiote 

Department  of  Computer  Science 
University  of  Toronto 
Toronto,  CANADA 


ABSTRACT 

What  is  most  crucially  lacking  In  the  area  of  user  Interface  design  is  a  set  of  tools  to 
integrate  various  mechanisms  that  are  well  understood  and  useful,  but  each  addressing  only 
limited  aspects  of  the  problem. 

This  paper  demonstrates  how  Data  Modeling  techniques  can  greatly  simplify  the  design 
of  user  interfaces.  The  main  contribution  of  this  work  is  a  single,  highly  coherent  and  very 
simple  framework  that  can  uniformly  represent  any  aspects  of  a  user  Interface. 

The  most  significant  departure  of  our  approach  from  other  programming  languages  is 
the  complete  and  explicit  separation  that  we  make  between  semantic  and  syntactic  aspects  of 
an  application.  We  also  introduce  internal  objects  to  model  various  stater  of  an  application 
and  the  various  state  transformations  that  are  allowed  between  these  states. 


I.  Motivation 

More  than  50%  of  the  code  in  most  commercial  applications  involves  the  definition  of 
interfaces  with  the  users  of  these  applications;  furthermore,  this  part  of  the  code  is  the  one 
that  typically  requires  the  most  customization  and  maintenance,  to  reflect  changes  in  users' 
requests  and  methods  of  operation.  Any  simplification  and  reduction  of  this  portion  of 
software  development  is  therefore  critically  needed  [Lientz  and  Swanson  81]. 

A  user  interface  can  be  defined  as  any  mechanism  used  to  mediate  between  users  and 
applications.  These  include  in  the  simplest  case  all  Input/Output  formatting,  up  to  the  most 
complex  cases  of  so-called  'natural  language'  front-ends  to  data  management  systems. 
Although  our  approach  could  also  be  applied  to  other  (graphical  or  hardware  oriented)  types 
of  user  interfaces,  we  concentrate  our  research  on  word-oriented  user-interfaces  in  this  work. 

In  this  range  of  interfaces,  going  from  rigid  input  formats  and  command  languages  to 
systems  trying  to  make  sense  of  arbitrary  English  expressions,  the  latter  obviously  presents  a 
more  difficult  problem.  But  any  solution  to  this  larger  problem  can  also  be  applied  to 
represent  simpler  mechanisms.  The  following  discussion  concentrates  on  the  problems  and 
issues  involved  in  building  the  more  sophisticated  types  of  user  interfaces,  with  the  under¬ 
standing  that  these  solutions  also  apply  to  simpler  cases.  We  will,  however,  restrict  our  atten¬ 
tion  to  natural  language  systems  used  in  practical  situations  today,  as  opposed  to  the  more 
advanced  ones  still  under  research. 

II.  Overview 

This  paper  demonstrates  how  Data  Modeling  techniques  can  greatly  simplify  the  design 
of  user  interfaces. 

The  first  simplification  is  to  break  down  the  problem  into  three  components:  semantics, 
syntax  and  pragmatics,  according  to  a  distinction  familiar  to  linguists.  We  approach  the  design 
problem  for  interactive  information  systems  from  a  knowledge  representation  point  of  view. 
To  us,  such  a  system  is  above  all  a  knowledge  base  of  'facts*.  Some  involve  the  outside  world, 
its  entities,  their  interrelations,  the  events  they  participate  in  and  their  histories  ('semantic* 
knowledge).  Others  are  about  the  grammar  and  the  lexicon  used  for  a  particular  user 
Proceedings,  International  Conference  on  Very  large  Data  Bases, 
Florence,  Italy,  October  1983. 
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interface  ("syntactic*  knowledge).  Yet  others  are  about  the  dialogue  structure  the  system  is 
expected  to  support  ('pragmatic*  knowledge). 

A  second  simplification  is  to  re-use  as  much  as  possible  of  the  syntactic  means  used  to 
specify  the  programs  and  data  definitions  typically  specified  by  a  programmer,  to  automatically 
provide  a  working  prototype  of  a  user  interface.  This  approach  is  made  possible  by  a  Data 
Modeling  framework  which  uniformly  represents  all  three  aspects  of  a  user  interface. 

This  opens  the  way  to  a  third  simplification,  which  is  to  defer  to  eventual  users  of  the 
interface  the  definition  of  any  syntactic  variation  above  the  generated  prototype.  This  might 
include  defining  more  English-like  front  ends  with  English  words  and  idioms,  or  conversely, 
defining  shorthands  and  codes  for  often-used  input  messages. 

111.  Related  work 

Among  systems  for  natural  language  access  to  data  bases  that  perform  well  enough  to 
deserve  consideration  as  practical  systems,  we  find  LIFER  [Hendrix  et  al.  78],  LUNAR 
[Woods  77],  PLANES  [Waltz  78],  SOPHIE  [Burton  and  Brown  76],  REL  [Thompson  and 
Thompson  78],  RENDEZ-VOUS  [Codd  et  al.  78]  and  TQA  [Plath  76,  Petrick  81].  These 
research  projects  have  already  lead  to  commercial  products,  starting  with  Harris’  INTELLECT 
[Harris  78]  at  Artificial  Intelligence  Corporation,  and  now  followed  by  Gary  Hendrix’ 
STRAIGHT  TALK  at  Symantec  and  Roger  Schank’s  work  at  Cognitive  Systems.  The  state- 
of-the-art  in  non-experimental  natural  language  systems,  as  recognized  by  [Barr  and  Feignen- 
baum  79],  is  still  very  much  typified  by  INTELLECT  and  LIFER. 

m.l.  Problems  and  Issues  In  user  Interfaces 

The  experience  with  users  of  LIFER  applications,  as  reported  in  [Hendrix  et  al.  79],  has 
demonstrated  that  a  number  of  facilities  are  required  to  obtain  more  'natural*  user-interfaces. 
The  most  important  of  these  desired  facilities  are: 

1.  a  syntactically  motivated  grammar; 

2.  more  complete  and  flexible  paraphrase  mechanism; 

3.  meta-knowledge  access; 

4.  uniform  representation  for  all  components  of  a  user-interface. 

An  important  goal  of  current  research  in  user  interfaces  is  transportability:  to  enable  non 
specialists  to  adapt  a  natural  language  processing  system  for  access  to  an  existing  conventional 
data  base.  Because  of  the  near-impossibility  of  transporting  LIFER  grammars  from  one 
domain  to  another,  the  LIFER  development  team  has  moved,  in  their  subsequent  research 
[Hendrix  et  a!.  79],  toward  developing  linguistically  motivated  grammars,  which  would  facili¬ 
tate  the  transfer  from  one  domain  to  another.  [Robinson  82]  is  the  most  recent  account  of 
these  efforts.  However,  the  use  of  linguistically  motivated  grammars  introduces  new  prob¬ 
lems.  'The  root  of  these  problems  lies  in  the  very  uniformity  of  syntactic  coverage  that  makes 
linguistically  motivated  grammars  transportable  and  resistant  to  gaps  in  coverage.  In  particu¬ 
lar,  a  uniform  treatment  of  syntax  demands  a  uniform  semantic  system*  [Hendrix  et  al.  79],  i.e. 
a  uniform  treatment  of  syntax  and  semantics,  not  only  for  data,  but  for  all  aspects  of  the  sys¬ 
tem. 

Also,  'users  want  more  than  just  access  to  the  data  actually  recorded  in  their  data  bases; 
this  has  been  shown  in  [Tennant  79],  where  a  distinction  is  made  between  linguistic  complete¬ 
ness  and  conceptual  completeness"  [Hendrix  et  al.  79].  According  to  Hendrix,  the  most 
promising  approach  to  this  problem  involves  the  creation  of  an  intermediate  representation 
level  that  mediates  between  the  language  processor  and  the  various  resources  available. 

A  significant  reduction  in  development  and  maintenance  efforts  could  be  achieved  if  the 
users  themselves  could  specify  the  particular  ways  in  which  they  would  prefer  to  use  their 
applications.  "System  users  and  members  of  the  academic  community  are  in  general  agree¬ 
ment  that  one  of  the  most  interesting  and  useful  feature  of  (LIFER-based  systems)  is  their 
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ability  to  be  taught  new  syntactic  constructs  by  ordinary  users  at  run  time'  [Hendrix  et  al.  79]. 

A  related  approach  is  to  minimize  the  basic  core  of  rules  that  is  essential  to  process  the 
user’s  initial  lexicon  entries  and  rules,  as  illustrated  in  [Shapiro  and  Neal  82],  so  that  'a  user 
could  then  input  rules  and  assertions  to  enhance  the  system’s  capabilities  to  acquire  both 
linguistic  and  non-linguistic  knowledge.  In  other  words,  the  user  will  define  his  own  input 
language  for  entering  knowledge  into  the  system  and  conversing  with  the  system.' 

Wording  the  system’s  questions  in  a  comprehensible  form  involves  a  number  of  human 
engineering  difficulties,  as  reported  in  [Grosz  82].  An  important  issue  is  therefore,  as 
expressed  in  [Haas  and  Hendrix  80],  that  'a  set  of  readily  understandable  questions  is  needed 
for  eliciting  information  from  tutors.  The  length  and  number  of  questions  should  be  minim¬ 
ized  to  impose  as  small  a  burden  on  tutors  as  possible.” 

A  common  issue  faced  by  all  of  the  above  mentioned  existing  or  proposed  natural 
language  interfaces  is  the  problem  of  'complexity*:  the  barrier  imposed  by  the  sheer  number  of 
details  to  handle.  Various  proposals  to  control  this  complexity  revolve  around  the  notion  of 
abstractions,  and  organization  of  abstractions  into  hierarchies  and  network  structures.  These 
tools  apply  as  much  to  the  concepts  that  we  want  to  represent  with  natural  language  as  to  the 
programming  constructs  needed  to  implement  the  support  for  handling  natural  language  inter¬ 
faces. 

[Winograd  79]  provides  a  good  summary  of  what  needs  to  be  done  to  achieve  a  higher 
level  programming  system  better  suited  to  develop  and  maintain  complex  applications  like 
natural  language  interfaces: 

A  higher  level  programming  system  must  emphasize  the  use  of  descriptive  languages  for  communication,  with 
the  ability  to  create  and  manipulate  descriptions  in  an  effective,  understandable  way.  Existing  formalisms  for 
description  (e.g.  predicate  calculus)  are  clear  and  well  understood,  but  lack  the  richness  typical  in  descriptions 
which  people  find  useful.  They  can  serve  as  a  universal  basis  for  description  but  only  in  the  same  sense  that  a 
Turing  machine  can  express  any  computation.  They  lack  the  higher  level  structuring  which  makes  it  possible  to 
manipulate  descriptions  at  an  appropriate  level  of  detail. 

-  T.  Winograd,  Beyond  Programming  Languagei,  1979 

Only  a  few  of  the  above  issues  are  addressed  in  this  paper,  but  a  more  complete  treat¬ 
ment  can  be  found  in  [Pilote  83a]. 

IV.  Our  Approach 

IV.l.  Overview 

As  argued  in  [Pilote  83b],  most  programming  languages  offer  very  few  constructs  to 
explicitly  deal  with  the  design  of  user  interfaces.  We  believe  that  what  is  most  crucially  lack¬ 
ing  in  this  area  of  user  interface  design  is  a  set  of  tools  to  integrate  various  mechanisms  that 
are  well  understood  and  useful,  but  each  addressing  only  limited  aspects  of  the  problem. 

The  essence  of  our  approach  is  a  uniform  representation  framework,  able  to  describe  and 
provide  access  to  its  own  definition.  We  call  this  representation  framework  INTERPRET,  to 
express  the  essence  of  our  approach,  which  consists  of  interpreting  (or  translating)  a  user- 
oriented  notation  into  a  machine  understandable  one,  and  also  to  acknowledge  a  strong 
influence  from  the  so-called  'interpreter-oriented'  methods  of  describing  the  formal  semantics 
of  programming  languages,  such  as  Denotational  Semantics  [Gordon  79],  on  the  design  of  the 
INTERPRET  language. 

We  limit  ourselves  to  mechanisms  allowing  primarily  access  to  data  and  programs  already 
stored  on  a  computer.  By  this  we  mean  that  we  view  the  role  of  a  user  interface  as  providing 
access  to  data  and  programs  previously  defined  by  a  professional  programmer  using  a  program¬ 
ming  language,  as  opposed  to  allowing  users  to  define  directly  new  data  types  and  programs 
through  an  interface.  This  restriction  is  conjectured  as  very  significant  in  reducing  the 
number  of  possible  intents  in  user  utterances. 
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As  for  the  Data  Modeling  foundation  of  our  work,  we  used  TAXIS  [Mylopoulos  at  al. 
80]  as  starting  point  since,  among  a  'bewildering  variety  of  knowledge  representations  ...  one 
of  the  most  complete  is  the  TAXIS  system  which  has  aspects  of  all  basic  kinds  of  knowledge* 
that  can  be  distinguished  in  the  current  projects  and  approaches  to  knowledge  representation 
[Sowa  80].  From  its  strong  influence  from  Artificial  Intelligence,  TAXIS  is  acknowledged  as 
'rich  enough  to  support  a  natural  language  interface  to  knowledge-based  systems*  [Sowa  80]. 
INTERPRET,  described  in  detail  in  [Pilote  83a],  simplifies  and  extends  TAXIS  [Mylopoulos  et 
al.  80]  to  allow  for  ti»a  description  of  all  aspects  of  user-interfaces. 

IV.2.  Basic  representation  framework 

As  in  TAXIS,  the  INTERPRET  framework  considers  three  basic  types  of  objects: 
tokens,  classes  and  metaclasses.  Tokens  are  undecomposable  units  of  information,  usually 
modeling  actual  entities  in  an  application  domain.  Classes  correspond  to  collections  of  tokens 
sharing  some  common  “properties"  (to  be  defined  below),  which  tokens  arc  said  to  be  instances 
of  the  class;  this  INSTANCE-OF  relationship  relates  an  object,  e.g.  John,  to  a  class  of  which  it 
is  an  instance,  e.g.  PERSON.  Similarly,  collections  of  classes  can  be  themselves  grouped  into 
higher  level  classes,  called  metaclasses. 

AH  (meta)classes  constituting  a  TAXIS  or  INTERPRET  program  are  organized  into  an 
IS-A  hierarchy  in  terms  of  the  binary  relation  IS-A  which  is  a  partial  order.  This  IS-A  rela¬ 
tionship  will  sometimes  be  referred  to  as  "specialization"  when  going  from  more  to  less  gen¬ 
eral,  or  conversely,  "generalization".  The  IS-A  relationship  relates  a  class,  e.g.  STUDENT,  to 
another  more  general  one,  e.g.  PERSON.  In  particular,  relations,  transactions  and  exceptions 
are  all  treated  as  classes  defined  through  the  properties  that  relate  them  to  other  classes,  and 
organized  in  terms  of  the  IS-A  relation  into  a  hierarchy. 

The  main  difference  between  TAXIS  and  INTERPRETS  IS-A  and  the  traditional  subset 
relationship  is  that  the  IS-A  relation  holds  even  between  classes  with  no  instances.  The  subset 
relation  between  the  sets  of  instances  of  IS-A-related  classes  is  thus  simply  a  side-effect  of  the 
definition  of  a  particular  IS-A  relation. 

Classes  and  metaclasses  model  conceptual  objects  which  are  "defined"  by  their  relations 
to  other  concepts,  and  the  operations  that  are  allowed  on  their  instances,  much  in  the  spirit  of 
"Abstract  Data  Types*  in  Programming  Languages.  Both  these  relations  and  operations  are 
viewed  as  definitional  properties  attached  to  (meta)classes.  These  definitional  properties  res¬ 
trict  the  factual  properties  that  can  be  defined  on  instances  of  these  classes.  For  example, 

property  age  on  PERSON  Is  {0..200} 

specifies  that  the  age  of  a  particular  person,  say  John,  must  be  in  the  range  {0.200}.  The  fol¬ 
lowing  expression  is  then  acceptable: 

John  age  -  22 

meaning  that  the  value  of  the  property  age,  when  applied  to  the  object  John,  becomes  the 
number  22. 

A  new  feature  of  INTERPRET  over  TAXIS  is  to  consider  properties  as  objects.  This 
means  that  a  property  category  is  itself  a  class  of  objects  in  INTERPRET,  whose  instances  are 
properties.  The  most  general  class  of  properties  is  called  "property*  (or  alternatively  "proper¬ 
ties"),  of  which  all  other  property  categories  are  specializations. 

IV  J.  Three  steps  design  methodology 

The  design  of  a  user  interface  can  first  be  simplified  by  decomposing  the  problem  into 
separate  subcomponents.  The  Example  section  below  illustrates  the  design  methodology  that  is 
made  possible  by  the  uniform  representation  of  all  aspects  of  a  user  interface: 
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1.  Define  the  semantic  data  objects  and  programs; 

2.  Define  the  valid  dialogue  paths,  and  specify  responses  to  exceptional  situations; 

3.  Add  a  syntactic  covering  on  top  of  the  above  facilities,  aiming  at  making  them  more 

"natural";  following  the  approach  advocated  in  this  research,  this  step  could  even  be 

handled  by  a  "casual”  user. 

The  decomposition  of  a  user  interface  into  three  components  is  particularly  significant  to 
reduce  the  effort  involved  in  setting  up  new  user-interfaces  for  casual  users.  The  methodol¬ 
ogy  proposed,  which  is  as  far  as  we  know  original  and  unique,  is  to  use  as  much  information 
as  possible  from  an  initial  specification  by  a  programmer,  to  the  extent  of  being  able  already  at 
this  point  to  provide  quite  "flexible"  facilities  to  a  "non-technical*  user.  This  working  basis  can 
then  be  successively  extended  in  accord  with  the  syntactic  preferences  of  the  user,  which  may 
be  more  English-like,  or  may  even  be  more  formal  and  abbreviated  if  so  wished. 

As  a  result  of  these  simplifications,  the  task  of  designing  a  user  interface  can  in  the 
extreme  case  be  reduced  to  the  specification  of  dialogue  paths. 

Note  that,  in  a  sense,  the  mechanisms  of  the  syntactic  and  pragmatic  components,  since 
they  must  be  represented  and  stored  as  data  and  programs  in  a  computer,  are  also  part  of 
what  we  called  the  semantic  component.  In  fact,  one  of  the  prime  goals  of  this  work  is  to  pro¬ 
vide  access  to  these  program  and  data  objects,  using  the  same  mechanism  and  procedures  that 
are  used  for  more  traditional  data  and  programs  in  data  bases  and  program  libraries.  But  for 
the  sake  of  the  discussion,  it  will  be  useful  to  distinguish  the  accessing  mechanisms,  classified 
into  syntactic  and  pragmatic,  from  the  representation  of  information  stored  in  the  computer. 

IV.4.  Maximize  the  use  of  predefined  Information 

The  second  way  in  which  we  simplify  the  task  of  designing  a  user  interface  is  by  taking 
advantage  of  the  information  already  provided  by  a  programming  language  specification.  This 
information  is  made  of  two  parts:  1)  identifiers;  2)  a  grammar  for  the  programming  language. 
By  replacing  the  programming  language  grammar  by  a  more  English-like  one,  we  already 
obtain  a  more  flexible  and  user-friendly  interface.  Then  a  wide  variety  of  user  inputs  in 
"natural"  language  can  be  translated  into  a  formal  equivalent  that  can  be  accepted  by  the  sys¬ 
tem. 

The  approach,  illustrated  in  the  Example  section  below,  of  explicitly  and  systematically 
using  syntactic  information  provided  by  programming  language  specifications  to  support  a  user 
interface  appears  to  be  new;  this  approach  is  made  possible  by  our  uniform  framework  which 
can  combine  the  representation  of  "internal"  information  with  "external",  user-defined 
knowledge. 

IVi.  Deferring  syntactic  customization  to  the  user 

Finally,  a  third  way  to  reduce  the  job  of  the  user  interface  designer  is  to  defer  part  of 
this  job  to  a  user  of  the  resulting  interface.  First,  by  representing  much  of  the  above  mechan¬ 
ism  in  a  sufficiently  organized  formalism  (i.e.  in  terms  of  INTERPRET  constructs),  we 
increased  the  comprehensibility  of  the  interface  and  reduced  the  effort  required  for  a  user  to 
understand  and  introduce  further  modifications  himself.  This  involves  primarily  modifying 
the  syntactic  component  of  a  user  interface,  to  fit  particular  preferences,  since  the  semantic 
and  pragmatic  aspects  of  a  user  interface  are  supposed  to  be  defined  by  professional  program¬ 
mers.  For  example,  one  programmer  may  like  a  very  concise  and  dense  notation,  while 
another  will  prefer  full  length  words,  with  lots  of  prompting  from  the  system.  Understanding 
the  underlying  information  is  very  important  to  make  modifications  possible,  but  is  even  more 
critical  to  "debug*  and  integrate  modifications  into  the  rest  of  the  system. 

The  next  step,  illustrated  in  our  example,  is  to  increase  the  flexibility  of  the  vocabulary 
by  introducing  synonyms  for  already  defined  identifiers  and  custom  paraphrases  beside  stan¬ 
dard  English  transformations.  The  motivation  for  a  syntactic  training  mechanism  is  that  users 
need  to  be  able  to  adapt  the  syntax  of  an  interface  to  the  particular  vocabulary  of  an 
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application,  and  their  own  particular  tastes.  These  are  too  diversified  to  rely  on  computer  or 
linguistic  experts  to  provide  the  required  changes.  Furthermore,  customizing  is  believed,  from 
the  author’s  personal  experience  with  ''end-users*  of  computer  systems,  to  be  one  of  the  most 
important  aspects  of  a  successful  user  interface. 

Current  solutions  to  this  requirement  for  the  acquisition  of  new  syntactic  knowledge 
typically  involve  long-winded  dialogues  to  gather  the  syntactic  classification  and  features  of 
new  words;  alternatively,  such  information  is  often  explicitly  given  via  a  programming 
language.  The  first  method  is  likely  to  strain  the  patience  of  its  users  while  the  second 
requires  a  deep  familiarity  with  programming  and  linguistics. 

Our  approach  here  clearly  belongs  to  the  'language  engineering"  stream:  it  is  all  based 
upon  conventions  between  users  and  programs.  Our  main  innovation  in  this  respect  is  to 
allow  the  customization  of  an  interface,  as  shown  below;  we  also  describe  in  [Pilote  83a]  tech¬ 
niques  that  could  be  used  to  describe  and  access  the  interface  itself,  to  allow  a  user  to  under¬ 
stand  its  features  and  limitations  and,  eventually,  to  modify  the  interface  itself. 

V.  Example  of  the  three  steps  derign  of  a  User  Interface 

The  last  section  of  this  paper  present  a  highly  simplified  example  of  the  kinds  of  infor¬ 
mation  that  must  be  incorporated  in  a  user  interface  to  make  it  truly  "flexible*.  Although  res¬ 
tricted,  this  example  also  illustrates  the  complexity  of  the  phenomena  to  handle,  most  of 
which  are  often  only  skimmed  in  many  interfaces  aiming  at  "user-friendliness*. 

Our  sample  application  domain  is  the  institutional  world  of  a  university,  and  our  particu¬ 
lar  example  is  the  identification  of  a  particular  student.  The  first  step  of  an  interaction 
between  a  user  and  the  system  consists  of  the  user  answering  a  request  from  the  system  to 
identify  himself,  from  which  the  system  will  decide  which  data  and  programs  can  be  made 
available  to  this  user.  We  describe  how,  even  for  such  a  simple  situation,  the  number  of  possi¬ 
ble  responses  is  unbounded.  However,  by  taking  advantage  of  the  rules  offered  by  a  built-in 
English  grammar,  the  number  of  "patterns'  required  to  match  most  of  these  user  inputs  are 
very  limited. 

We  will  first  aim  at  a  minimal  mechanism  able  to  produce  a  particular  desired  result, 
without  any  concern  for  user  oriented  features.  Our  interface  will  include  barely  enough 
information  to  support  the  semantics  of  an  application.  The  important  point  is  that,  already 
after  the  first  step,  the  user  is  provided  with  a  working  system  which  can  be  evaluated  against 
his  requirements. 

Then,  in  a  second  step,  we  will  specify  dialogue  paths  as  an  extension  of  the  semantic 
component  of  our  interface.  These  dialogue  paths  will  specify  at  any  time  the  range  of  possi¬ 
ble  actions  that  can  be  triggered  in  the  system  or  the  items  of  information  that  must  be 
obtained  by  the  system  from  the  user. 

Thirdly  and  finally,  we  will  define  a  syntactic  interface  for  the  above  facilities  that  will 
allow  either  access  in  a  programming  language-like  format,  or  in  an  English-like  more 
'natural'  fashion.  This  last  step  illustrates  that  artificial  and  natural  languages  are  not  incom¬ 
patible  but  can,  in  fact,  be  intermixed  according  to  the  user’s  needs  and  preferences. 

Appendices  1-4  collect  the  detailed  and  complete  INTERPRET  declarations  supporting 
the  following  example. 

V.l.  The  semantic  component 

The  semantic  component  for  our  example  includes  a  data  class  representing  student 
information,  as  shown  in  Appendix  1.  Graphically,  our  semantic  component  includes  the  data 
classes  and  tokens  shown  in  Figure  1. 

Instances  of  the  classes  PERSON  and  STUDENT  are  data  objects  with  a  number  of 
attributes.  Using  only  semantic  facilities  (with  a  standard  programming  language  syntax  - 
INTERPRET  in  this  case)  we  can  explicitly  refer  to  instances  of  these  classes  with  expressions 
like: 
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PERSON  (si#,  address,  name,  phone#) 

f  \ 

I  STUDENT  (si#,  address,  name,  phone#, 
/S  student#,  faculty,  year,  status) 


John  (T23456789’,  ’37  Purdon  Dr.,  Toronto’, 
’John  Smith’,  ’412-7841’,  ’007812345’, 
’Arts  and  Science’,  1,  ) 

I 

I 

Mary  (734567899’,  77  Purdon  Dr.,  Toronto’, 

•Mary  Smith’,  ’412-7841’) 


Flg.l-  Semantic  object  defined  in  Appendix  1;  double  arrows  represent  the  IS-A  rela¬ 

tionship,  and  single  arrows  represent  INSTANCE-OF. 


John  -  the  STUDENT 

with  name  =  ’John  Smith’; 

where  John  is  an  identifier  assigned  values  by  the  INTERPRET  expression  following  the 
arrow.  An  expression  like  ’ variable-name  -  the  CLASS  with  property  =  value ’  can  be  viewed 
(in  terms  more  familiar  to  many  readers)  as  a  database  "query*. 

If  this  request  is  not  sufficient  to  identify  uniquely  an  instance  of  STUDENT,  the 
INTERPRET  construct  the  will  raise  the  exception  MORE-THAN-ONE.  The  user  then  has 
to  examine  the  environment,  maybe  query  the  class  STUDENT  to  examine  the  multiple 
instances  named  ’John  Smith’,  to  be  able  finally  to  pinpoint  a  combination  of  properties  able 
to  identify  the  desired  individual.  To  get  a  unique  instance  of  STUDENT,  the  user  may  end 
up  having  to  re-enter  a  more  complete  expression  like: 

John  -  the  STUDENT  with  name=’John  Smith’, 
address=*37  Purdon  Dr.,  Toronto’; 


VJ.  The  pragmatic  component 
V.2.1.  Definitions 

A  user  could  obtain  essentially  the  same  result  as  in  the  above  semantic  component  more 
simply  by  allowing  the  program  to  guide  the  interaction  and  explicitly  ask  for  needed  informa¬ 
tion.  This  effect  is  obtained  via  "scripts",  which  basically  specify  the  allowable  successions  of 
input/output  interactions  between  the  user  and  the  program.  INTERPRET  scripts  simplify 
and  extend  TAXIS  scripts  [Barron  80],  inspired  from  Zisman’s  Augmented  Petri  Nets  [Zisman 
77].  A  script  to  support  and  complement  the  above  semantic  component  is  shown  in  Appendix 
2.  This  "master"  script  can  help  identify  unique  instances  of  the  class  STUDENT.  It  directs  a 
complete  session  with  a  user,  and  is  activated  by  some  unspecified  means.  The  first  action  of 
this  script  is  to  execute  the  INTERPRET  expression 
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The  variable  user,  defined  for  this  master  script,  contains  a  reference  to  the  user  terminal  and 
must  be  included  in  any  script  expression  used  to  communicate  from  the  program  to  the  user. 
Once  a  request  has  been  made  for  some  value,  these  can  be  furnished  to  the  system  in  any 
order,  after  an  arbitrary  period  of  time.  Scripts  are  designed  to  stay  active  for  as  long  as  they 
have  not  completed  their  purpose,  i.e.  until  their  state  end  is  'reached'.  Conversely,  this  user 
sends  messages  to  the  script  by  mentioning  a  reference  to  this  script  in  an  expression  such  as: 

glve(system,  person  -  John); 

where  ’system’  is  the  internal  lexical  token  denoting  the  instance  of  MASTER-SCRIPT  used 
in  our  example,  and  where  ’John’  is  the  name  of  an  instance  of  the  data  class  PERSON.  This 
instance  could  have  been  obtained  as  in  the  'semantic  component”  section  above.  So  far,  the 
only  gain  is  to  allow  to  specify  argument  when  desired,  and,  to  possibly  perform  some  action 
automatically,  in  case  a  particular  exception  is  raised. 

We  could  also  further  simplify  the  user  responsibility  by  automating  the  process  of 
selecting  a  unique  instance  of  a  variable  class,  in  this  case  STUDENT.  This  simplification 
would  involve  defining  an  additional  script-class  to  gather  enough  property  values  on  STU¬ 
DENT  to  identify  uniquely  an  instance  of  this  class. 

VJ.  The  syntactic  component 

VJ.l.  Internal  syntax 

Nothing  has  been  said  in  the  above  'script*  example  about  how  the  user  is  informed  of  a 
request  by  a  script  for  a  particular  item  of  information.  Unless  explicit  messages  are  provided, 
the  above  take  commands  will  generate  default  expressions  in  terms  of  the  requested  object. 
The  default  format  on  a  take  command  is: 

"<script-id>  :  Please  enter  a  <  object  .name  >' 

where  the  identifier  of  the  script  requesting  an  input  from  a  user  is  shown  before  the  com¬ 
mand.  This  set-up  results  in  the  following  dialogue  from  the  user  point  of  view: 

System:  Please  enter  a  PERSON. 

User:  John. 


The  purpose  of  the  syntactic  component  of  a  user  interface  is  to  support  the  syntactic 
aspects  of  printed  man-machine  communications,  such  as  extracting  semantic  information 
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from  the  above  user  input,  or  from  any  equivalent  form,  as  we  will  see  below. 

We  have  seen  above  the  representation  of  the  semantic  component  for  our  example  of  a 
user  interface  for  an  Interactive  Information  System.  The  semantic  declarations  also  define 
entries  in  an  internal  lexicon".  These  lexical  objects  can  be  used  by  two  built-in  grammars  to 
provide,  on  one  side,  a  programming  language  format  (the  INTERPRET  language)  for  pro¬ 
grams  listings,  maintenance  and  further  developments  by  programmers  and,  on  the  other  side, 
an  English-like  format  to  describe  and  recognize  semantic  objects  while  interacting  with 
'casual'  users  unfamiliar  with  INTERPRET  syntax.  We  want  to  stress  that  already  at  this 
stage,  an  INTERPRET  program  can  be  used  and  manipulated  by  both  kinds  of  users,  using 
this  internal  lexicon  and  built-in  grammar  rules. 

For  example,  the  syntactic  component  of  a  user  interface  could  extract  from  the  above 
user  input  a  reference  to  an  instance  of  the  data  class  PERSON. 


After  the  data  definitions  shown  in  Appendix  1,  and  before  any  additional  syntactic  infor¬ 
mation,  the  user  interface  knows  about  the  lexical  tokens  shown  in  Figure  3  as  direct  instances 
of  the  classes  %Const,  %Class,  %Property  and  %Token.  As  shown  in  this  figure,  beside  the  lexi¬ 
cal  token  ’John’,  which  is  recognized  as  an  instance  of  the  class  %Const ,  the  lexical  informa¬ 
tion  known  about  the  class  PERSON  includes  the  lexical  tokens  ’PERSON’,  ’ident’,  ’si#’, 
’address’,  ’name’  and  ’phone#’,  for  the  definition  of  the  class  itself,  plus  the  lexical  tokens 
’STUDENT’,  ’id’,  ’student#’,  ’faculty’,  and  ’year’,  for  its  specialization  classes.  Also  known  in 
the  internal  lexicon  are  the  values  for  enumerations  of  lexical  tokens,  such  as  ’part-time’  and 
’full-time’.  Finally,  the  lexicon  includes  the  printable  values  for  the  factual  properties  of  the 
semantic  tokens  defined  in  the  data  base.  These  include  the  lexical  tokens  ’John  Smith’,  *37 
Purdon  Dr.,  Toronto’,  ’Arts  and  Science’  and  ’Mary  Smith’.  We  can  see  from  the  above  list 
that  many  of  these  lexical  tokens  can  be  used  directly  in  natural  English  expressions. 

Note  how  'internal'  lexical  classes  like  %Const  and  %Class  are  specializations  of 
"English'  lexical  classes  like  %ProperNoun  and  %Concept  respectively,  which  are  used  by  a 
built-in  English  grammar  such  as  shown  in  Appendix  4  to  recognize  English-like  expressions.  In 
our  example,  the  lexical  token  ’John’  is  first  recognized  as  an  instance  of  %Const,  and  there¬ 
fore  also  as  an  instance  of  %ProperNoun.  Any  rule  of  the  Built-in  English  Grammar  using  a 
%ProperNoun  will  therefore  accept  ’John’  as  a  proper  noun. 
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VJJ.  Syntactic  customisation 

A  distinctive  characteristic  of  INTERPRET  over  TAXIS  and  over  most  other  program¬ 
ming  languages  is  that  we  explicitly  model  all  syntactic  objects,  i.e.  there  is  an  internal  object 
for  each  printable  object  in  the  interface. 

Syntactic  (lexical)  objects  are  connected  to  the  semantic  object  which  they  denote  by  the 
relation  named  den.  Many  syntactic  objects  can  be  defined  as  denoting  the  same  semantic 
object  by  using  the  relation  trans  between  them,  specifying  that  one  is  a  translation  of  the 
other.  For  example,  given  the  lexical  token  ’John  Smith’,  the  identifier  ’John’,  the  token  John, 
and  the  following  factual  properties: 

’John  Smith’.trans  =  ’John’ 

’John’xien  =  John 

we  can  derive  the  following  equalities: 

’John  Smith ’.den 

=  ’John  Smith’.trans.den 
=  ’John ’.den 
=  John 

although  we  gave  no  direct  representation  of  the  factual  property 

’John  Smith’xlen  =  John 

In  summary,  the  essence  of  our  approach  to  syntactic  customization  is  to  introduce  intermedi¬ 
ate  translations  leading  to  some  denotation  predefined  by  a  programmer. 

The  most  interesting  aspect  of  the  syntactic  component  is  the  extensions  that  a  user 
interface  designer  may  define  for  a  particular  application,  and  for  particular  users.  These 
extensions  include  both  additions  of  lexical  and  grammatical  information.  These  additions  can 
again  be  done  in  two  modes,  corresponding  to  the  two  main  types  of  users.  Programmers  can 
use  INTERPRET  syntax  to  define  new  lexical  and  grammatical  classes,  assign  their  denota¬ 
tions  and  insert  new  lexical  tokens  in  the  lexicon.  Casual  users  can  obtain  the  same  effect, 
indirectly,  under  the  control  of  another  built-in  grammar  for  "syntactic  training",  whose  pat¬ 
terns  trigger  the  same  semantic  actions  as  those  specified  by  programmers.  Of  course,  this 
'syntactic  training"  grammar,  as  the  "ordinary  English"  one,  can  not  claim  to  capture  all 
semantic  actions  that  a  programmer  may  define.  It  is  sufficient  that  they  capture  most  of  the 
common  uses  and  definitions,  with  facilities  to  describe  the  limits  of  their  capabilities  and 
guide  one  into  extending  these  limits  when  desired. 

The  syntactic  component  can  thus  be  extended  to  handle  more  natural  interactions  like 
the  following  sample  dialogue: 


System:  Please  identify  yourself. 

User:  I  am  John  Smith. 

System:  OK,  you  are  the  student  John  Smith. 


The  main  additions  required  to  produce  the  behavior  shown  above  are  for  data  class 
names  and  for  some  abbreviated  property  names.  These  will  be  defined  as: 
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%CONCEPT  ’person’  with  trans  -  ’PERSON’; 

%CONCEPT  ’student’  with  trans  ►  ’STUDENT’; 

%RELATION  ’social  insurance  number’ 

with  trans  -  ’si#’; 

%RELATIQN  ’telephone  number’ 

with  trans  -  ’phone#’; 

%RELATION  ’number’  with  trans  -  ’student#’; 

Our  lexicon  now  contains  the  I-classes  and  lexical  tokens  shown  in  Figure  3. 

Some  of  these  lexical  tokens  need  to  be  further  classified  to  be  correctly  used  by  the 
built-in  English  grammar: 

%PROPER-NOUN  ’Mary  Smith’  with  den  -  Mary; 

%PROPER-NOUN  ’John  Smith’  with  den  *•  John; 

%PROPER-N OUN  ’123456789’  with  den  -  John; 

%PROPER-NOUN  *234567899’  with  den  -  Mary; 

%PROPER-NOUN  *0078 12345’  with  den  -  John; 

These  expressions  define  the  corresponding  lexical  tokens  as  able  to  play  a  subject  or  object 
role  in  an  English  sentence,  according  to  the  Built-in  English  Grammar  shown  in  Appendix  4. 

We  also  have  to  extend  the  built-in  English  grammar  with  patterns  that  relate  specifically 
to  the  data  classes  defined  in  the  semantic  component  of  an  INTERPRET  program  (unless 
they  are  also  built-in  for  such  common  data  classes  as  PERSON,  but  we  assume  it  is  not  the 
case  here).  Some  of  the  most  common  expressions  that  could  be  used  to  answer  the  request 
’Please  identify  yourself’  are: 

’ John  Smith ' 

7  am  John  Smith ' 

'My  name  is  John  Smith' 

'123456789' 

'My  social  insurance  number  is  123456789' 

7  live  at  37  Purdon  Dr.,  Toronto' 

'My  phone #  is  412-7841' 

7  am  009812345 ' 

7  am  the  student  007812345' 

7  am  the  first  year  student  from  Arts  and  Science 
7  am  the  person  named  John  Smith' 

and  the  list  could  go  on.  But  much  of  this  apparent  diversity  can  be  captured  with  a  few  basic 
patterns: 

GRAMMATICAL-CLASS  &I-AM 

:=  (["I  am"]  &N OUN-PHR ASE); 

GRAMMATICAL-CLASS  &MY-IS 

:=  "My  <  %PROPERTY>  is  <  %TOKEN>' 

A  more  detailed  specification  of  these  grammatical  classes  is  given  in  Appendix  3.  The 
definition  of  grammatical  classes  for  a  given  application  can  be  much  simplified  by  taking 
advantage  of  built-in  grammatical  classes.  For  example,  the  definition  of  the  specialization  of 
the  grammatical  class  &GET-PERSON  illustrates  the  gains  obtained  by  borrowing  grammatical 
classes  from  the  built-in  English  grammar:  any  expression  involving  lexical  tokens  satisfying 
the  rules  of  the  built-in  grammatical  class  &NQUN-PHRASE  is  recognized  and  decomposed 
into  standard  components,  and  receives  an  also  standard  interpretation  from  the  interpreta¬ 
tion  of  the  lexical  tokens  involved. 

We  have  not  yet  defined  the  lexical  tokens  ’named’  and  'live  at'.  This  is  done  as  follows: 
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%ONE-PLACE-MOD  ’named*  with  trans-  ’name’ 
%TRANSITIVE-VERB  live  at’  with  trans  -  ’address’ 

These  definitions  then  allow  for  input  sentences  such  as  ’I  am  named  ...’,  1  live  at  using  the 
specializations  %I-AM-MOD  and  &I-VERB  of  &I-PRED,  also  defined  in  Appendix  3. 

As  an  example  of  the  analysis  of  an  input  sentence  matched  by  the  grammatical  class 
&GET-PERSON  and  its  specializations,  the  input  sentence  1  am  123456789*  is  first  matched  by 
the  grammatical  class  &I-AM  (see  Appendix  3),  because  the  input  fragment  T23456789’  matches 
the  specialization  &PROPER-NOUN  of  &NOUN-PHRASE  (see  Appendix  4).  This  fragment 
becomes  the  value  of  the  local  variable  person  in  &I-AM,  with  a  den  property  specified  in 
&PROPER-NOUN  as  the  result  of  ’123456789*.den. 

We  then  have: 

’I  am  123456789’xien 
=  ’123456789’jden 

=  John 


Depending  on  the  degree  of  flexibility  desired  at  this  level  we  may  want  to  define  other 
lexical  or  grammatical  classes  relating  to  the  recognition  of  instances  of  PERSON.  For  exam¬ 
ple,  ’first  year’  as  a  %MOD  denoting  the  property-value  pair  *year=r,  Trom’  as  a  %ONE- 
PL ACE-MOD  standing  for  ’faculty’,  etc.  Or  we  may  define  &GET-PERSON  as  a  specializa¬ 
tion  of  the  built-in  grammatical  class  &SENTENCE,  to  take  advantage  of  built-in  transforma¬ 
tions  that  will  transform  input  sentences  of  the  form  *1  think  <  &SENTENCE>",  'I  tell  you 
that  <  &SENTENCE>  *,  etc.  into  a  recognizable  *<  &SENTENCE> 

Taking  stock  of  the  grammatical  classes  defined  in  Appendix  3.  we  have  a  set  of  speciali¬ 
zations  of  the  grammatical  class  &GET-PERSON,  as  depicted  in  Figure  4.  The  final  mechan¬ 
ism  needed  is  the  attachement  of  this  mini-grammar  to  the  class  PERSON,  such  that  a  request 
for  an  instance  of  this  class  may  use  &GET-PERSON  to  recognize  and  obtain  its  answer  from 
an  input  sentence. 

Finally,  we  need  to  extend  in  the  following  way  the  definitions  of  MASTER-SCRIPT 
given  in  Appendix  2  to  account  for  our  dialogue  example: 
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property  format  on  MASTER-SCRIPT  ..  person  Is 
"Please  identify  yourself"; 

property  grammar  on  MASTER-SCRIPT  ..  person  is 

&GET-PERSON; 


This  means  that  any  expression  like  ’I  am  John  Smith’  has  to  match  the  grammatical  class 
&G ET-PERSON  to  be  recognized  as  a  valid  reference  to  an  instance  of  the  data  class  PER¬ 
SON.  A  request  for  the  value  of  a  property  uses  the  "pattern*  attached  to  the  format  property 
of  the  first  property  to  generate  an  expression  to  be  sent  to  userl .  The  reply  has  to  match  the 
pattern  attached  to  the  grammar  property  to  be  recognized  and  accepted.  A  grammar  pattern 
will  decompose  a  successfully  matched  input  expression,  and  produce  a  "denotation",  i.e.  a 
reference  to  some  "internal*  object (s).  The  denotation  of  an  expression  matched  by  a  pattern 
is  given  by  the  value  of  the  property  den  applied  on  this  expression. 

VI.  Conclusion 

This  paper  described  three  main  ways  in  which  the  design  of  user  interfaces  can  be 
simplified: 

1.  we  reduced  the  complexity  of  the  problem  by  breaking  it  down  in  three  components;  this 
decomposition  results  in  a  unique  and  novel  approach  to  user  interface  design  which 
allows  the  specification  of  user-oriented  syntactic  aspects  to  be  postponed  to  the  last  step 
of  the  design; 

2.  we  showed  how  to  maximize  the  use  of  syntactic  information  already  specified  in  the 
declaration  of  data  types  and  programs; 

3.  part  of  the  design  job  could  then  be  deferred  to  a  user  of  the  interface. 

As  mentioned  at  the  beginning,  the  main  goal  of  this  work  is  to  integrate  various 
mechanisms  found  useful  in  many  different  approaches  to  represent  and  design  user  inter¬ 
faces,  as  opposed  to  trying  to  break  new  grounds  along  any  of  these  directions.  Once  we 
reached  a  common  basis  to  this  effect,  we  found  that  we  could  expand  it  arbitrarily  to  follow 
any  direction  of  current  research,  yet  remained  able  to  compare  these  different  solutions 
between  each  other. 

In  particular,  any  of  the  current  works  on  representing  and  analyzing  complex  queries 
can  be  expressed  and  integrated  in  our  framework.  An  extreme  example  of  this  capability  is 
the  representation  in  [Pilote  83a]  of  the  syntax  and  model  semantics  of  of  subset  of  English 
according  to  the  work  of  Gazdar  [Gazdar  82],  which  is  storming  the  linguistic  world. 

The  decomposition  of  a  user  interface  into  three  components  is  particularly  significant  to 
reduce  the  effort  involved  in  setting  up  new  user-interfaces  for  casual  users.  These  interfaces 
can  be  designed  in  three  separate  steps,  taking  successively  care  of  the  semantic,  pragmatic  and 
syntactic  aspects  of  the  interface. 

Once  a  "natural*  interface  has  been  defined  for  data  and  programs  represented  using  the 
semantic  features  of  INTERPRET,  an  interesting  side-effect  of  our  approach  is  that  this  inter¬ 
face  can  then  be  used  to  examine  and  even  modify  its  own  structures,  since  all  information 
underlying  this  interface  is  represented  in  exactly  the  same  way  as  any  other  purely  semantic 
information.  The  impact  of  such  a  mechanism  remains  to  be  explored  but  it  promises  far- 
reaching  results.  The  feasibility  of  this  approach  has  already  been  demonstrated  by  the  users 
of  LIFER  [Hendrix  et  al.  79],  relying  exclusively  on  synonyms  and  paraphrases  to  customize  a 
particular  user  interface.  Again,  our  goal  here  is  first  of  all  to  duplicate  such  results  in  a  more 
organized  framework.  This  allowed  us  in  particular  in  [Pilote  83a]  to  integrate  LIFER- like 
mechanisms  with  more  linguistically  oriented  ones  such  as  the  work  of  Gazdar. 
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Appendix  1 

Definitions  of  semantic  data  classes 


DATA-CLASS  PERSON  with 
keys  person:  (si#); 
characteristics 

si#:  %DIGIT*9; 
attributes 

address:  %Token; 
name:  %Token; 
phone#:  &Phone#; 

end 

DATA-CLASS  STUDENT  Isa  PERSON  with 
keys  student:  (student#); 
characteristics 

student#:  %DIGIT*9; 
attributes 

faculty:  %Token; 
year:  {1..7}; 

status:  {’part-time’,  ’full-time’}; 

end 

In  addition  to  the  above  "type"  declarations,  we  define  the  following  "data"  as  part  of  the 
extension  of  these  classes: 

STUDENT  John  with 

name  -  ’John  Smith’,  si#  -  ’123456789’, 
address  -  ’37  Purdon  Dr.,  Toronto’, 
phone#  -  ’412-7846’,  student#  «-  ’007812345’, 
faculty  -  ’Arts  and  Science’,  year  -  1; 

PERSON  Mary  with 

name  -  ’Mary  Smith’,  si#  -  *234567899’, 
address  -  ’37  Purdon  Dr.,  Toronto’, 
phone#  -  ’412-7841’; 


Notation 

-  the  class  PERSON  is  an  instance  of  the  metaclass  DATA-CLASS; 

-  the  property  'name*  is  an  instance  of  the  property  category  properties; 

-  'name*  takes  an  instance  of  PERSON  as  argument  to  produce  a  lexical  token; 

-  the  class  STUDENT  is  a  subclass  of  PERSON; 

-  the  property  courses  is  multi-valued; 

-  {1.7}  is  a  range  class  of  all  integers  between  100  and  2999; 
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Appendix  2 
Script  definition 


SCRIPT-CLASS  MASTER-SCRIPT:  (start,  user  —  end)  with 
states 

start,  end:  STATE; 

gotPerson  Isa  start:  STATE  with  conditions 

person  In  STUDENT  exc  waitPerson; 
gotStudent  tsa  gotPerson  with  conditions 
person  In  STUDENT; 
waitPerson:  EXCEPTION-STATE; 
moreThanOne  tsa  waitPerson:  EXCEPTION-STATE; 

locals 

user:  TERMINAL-CODE; 
person:  PERSON; 

transitions 

GetPerson:  (waitPerson  -  gotPerson)  with 
actions 

al:  take(user,  person); 

end 

Studld:  (gotStudent  -  end)  with 
actions 

al:  Instantiate  STUDENT-SCRIPT(user); 

a2:  glve(user,  "OK.  You  are  the  student  <userjiame>"); 

end 

GetCancel:  (waitPerson  -  end)  with 
actions 

al:  take(user,  "Bye"); 
a2:  glve(user,  "OK.  Bye."); 

end 

end  MASTER-SCRIPT 


Notatloa 


An  instance  of  MASTER-SCRIPT  represents  a  function  from  a  state  which  is  named  ’start’,  to 
produce  a  new  state  named  'end'.  This  script  also  takes  as  another  argument  an  instance  of  TERMINAL- 
CODE,  which  identifies  an  externa!  entity  (user  or  another  script)  which  will  send  messages  to  the  script 
and  to  which  replies  will  be  sent. 

As  in  Petri  nets,  any  successful  transition  disactivates  its  argument  state(s)  and  activates  its  target 
state(s).  As  soon  as  one  transition  activates  end.  the  script  terminates. 
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Appendix  3 

User-denned  grammatical  classes 


GRAMMATICAL-CLASS  &GET-PERSON  Isa  &SENTENCE 
:=  &I-AM  I  &MY-IS  I  &I-PRED; 

GRAMMATICAL-CLASS  I-AM  :=  *[  I  am  ]  <person>"  with 
locals 

person:  &NOUN-PHRASE; 
den:  PERSON  default  person  .den; 
conditions 

cl:  NP-ISA-RELATED(person.head,  PERSON); 


GRAMMATICAL-CLASS  &MY-IS  :="My  <  prop>  is  <  this>  "  with 
locals 

prop:  %REL ATION ; 
this:  &NOUN-PHRASE; 

trans:  "the  person  whose  <  prop.trans>  is  <  this.trans>  "; 

conditions 

cl:  ISA-RELATED(prop.den subject,  PERSON) 
exc  NOT-A-PROP-OF-A-PERSON; 

end 

GRAMMATICAL-CLASS  &I-PRED  :=  &I-AM-MOD  I  &I-VERB  with 
locals 

trans:  "the  person  whose  <pred.trans>  is  <  obj.trans>  "; 
obj:  &NOUN-PHRASE; 


GRAMMATICAL-CLASS  &I-AM-MOD  Isa  &I-PRED  :=  "I  am  <pred>  <obj>"  with 
locals 

pred:  «&ONE-PLACE-MOD; 


GRAMMATICAL-CLASS  &I-VERB  Isa  &I-PRED  :=  "I  <pred>  <obj>"  with 
locals 

pred:  %TRANSITIVE-VERB; 

end 
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Assuming  that 

TRANSACTION-CLASS  NP-ISA-RELATED:  (np,  s-class  -  res)  with 
locals 

s-class:  ANY-CLASS; 
np:  &NOUN-PHRASE; 
res:  BOOLEAN  default  false; 
actions 

al:  If  npliead  #  nothing  then 

If  np.head  In  %PROPER-NOUN  then 
if  npiiead.den  In  s-class  then 
res  -  true; 

else  If  npliead  In  %COMMON-NOUN  then 
If  ISA-RELATED(np.head.den,  s-class)  then 
res  -  true; 

end 

TRANSACTION-CLASS  ISA-RELATED:  (s-classl,  s-class2  -  res)  with 
locals 

s-classl,  s-class2:  ANY-CLASS; 
res:  BOOLEAN  default  false; 
actions 

al:  If  (s-classl  Isa  s-class2)  or 

(s-class2  tsa  s-classl)  then 
res  -  true; 


end 
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Appendix  4 

Built  In  grammatical  classes  used  In  Appendix  3 


GRAMMATICAL-CLASS  &SENTENCE  with 
locals 

subject:  &NOUN-PHRASE; 
pred:  &VERB-PHR ASE ; 

end 

GRAMMATICAL-CLASS  &NOUN-PHRASE 

GRAMMATICAL-CLASS  &VERB-PHRASE  with 
locals 

verb:  %VERB  I  &VERB-PHRASE; 
object:  &NOUN-PHRASE  I  &PREP-PHRASE; 

end 

GRAMMATICAL-CLASS  &PREP-PHRASE  with 
locals 

prep:  %PREP; 

subject:  &NOUN-PHRASE; 


GRAMMATICAL-CLASS  &PROPER-NOUN  Isa  &NOUN-PHRASE 
:=  head  with 
locals 

head:  %PRQPER-NOUN; 
trans:  "<  head.trans>  "; 

end 

GRAMMATICAL-CLASS  &NOUN-PHRASE1  Isa  &NOUN-PHRASE 
:=  "the  <head>  whose  <prop>  is  <np>"  with 
locals 

head:  %CLASS; 
prop:  %PROPERTY; 
np:  &NOUN-PHRASE; 

trans:  "the  <  head. trans>  with  <  prop.trans>  =  <np.trans>" 

end 

GRAMMATICAL-CLASS  &NOUN-PHRASE2  Isa  &NOUN-PHRASE 

:=  "the  <  head>  <  ap>  "  with 
locals 

head:  %CLASS; 
ap:  %  ADJECTIVE ; 

trans:  "the  <  head.trans>  with  <  ap.trans>  "; 


end 
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GRAMMATICAL-CLASS  &NOUN-PHRASE3  Isa  &NOUN-PHRASE 
:=  "the  <head>  who  <vp>"  with 
locals 

head:  %CLASS; 

vp:  &  VERB-PHRASE ; 

trans:  "the  <head.trans>  with  <  vp.trans>"; 

end 

GRAMMATICAL-CLASS  &I -TRANSFORM  Isa  &SENTENCE  with 
locals 

subj:  'I*; 

trans:  "<  pred.verb>  <  nser>  <  prcd.objcct> 

end 
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Persistent  Errors  and  Persistent  Events 
Issues  in  Programming  Languages  for  Information  Systems 

Alexander  Borgida 
Department  of  Computer  Science 
Rutgers  University 
New  Brunswick,  NJ  08904 

Introduction. 

At  least  some  participants  at  this  workshop,  including  myself,  believe  that  research  on 
Programming  Languages  and  systems  can  gain  notaole  advantages  by  restricting  its  attention 
to  a  subclass  of  all  possible  computer  systems  -  one  with  some  clea^  commonalities  and 
wide  practical  importance.  Information  Systems  (IS)  appear  to  form  such  a  class,  and  they 
exhibit  a  number  of  special  features  which  must  be  considered,  and  possibly  exploited, 
including: 

-  an  increased  range  of  "persistence" 

-  larger  quantity  of  information 

-  sharing 

-  relative  volatility  of  the  procedures  used  in  relation  to  the  data  structures 

-  interaction  with  users 

These  features  motivate  research  on  such  disparate  topics  as  concurrency  and  multiple 
views,  integerity  constraints  stated  independently  of  the  programs  manipulating  data,  user 
interfaces  and  query  languages.  From  the  Programming  Language  point  of  view, 
considerable  attention  has  been  paid  to  dealing  with  the  persistence  of  data  as  attested  by 
languages  such  as  Pascal/R,  Riegel,  Adaplex,  PS-ALgol  etc.  At  this  workshop,  Atkinson 
championed  the  cause  of  a  wider  application  of  the  term  "persistence”,  suggesting  the  need 
for  considering  both  data  and  functions  as  persistent  entities.  I  wish  to  briefly  discuss 
here  two  relatively  neglected  aspects  of  persistence,  ones  which  however  I  feel  are 
important  in  the  context  of  Information  Systems,  namely  "persistent  events”  and  "persistent 
errors". 

Persistent  Events 

Recent  research  in  Semantic  Modeling  ([1,2])  has  argued  that  in  order  to  develop 
properly  ISs,  we  must  take  into  consideration  the  procedures  which  will  use  the  data  as 
well  as  the  structure  of  the  data  itself,  and  that  ISs  will  be  easier  to  build  and  use  if 
transactions  model  naturally  occurring  events  in  the  user  environment.  Unfortunately,  the 
traditional  Database  view  of  procedures  is  that  of  a  transaction  which  is  "atomic" 
(indivisible/conceptually  instantaneous)  in  order  to  support  concurrency  control  and  recovery. 
This  makes  them  inappropriate  as  models  for  "long  term  activities"  which  appear  to  abound 
in  practical  applications. 

For  example,  a  hospital  IS  for  clinical  trials  of  new  drugs  revolves  naturally  around  the 
concept  of  a  "clinical  trial".  Such  a  trial  usually  involves  an  initial  interview  of  the  patient, 
accompanied  by  a  physical  examination  and  signing  such  things  as  consent-forms,  and  is 
followed  by  several  successive  appointments  and  visits  with  clinicians.  These  appointments 
may  have  to  be  made  at  specific  intervals,  depending  on  the  trial  design,  and  if  patients 
miss  appointments  special  actions  may  have  to  be  taken.  The  patient  can  however 
withdraw  from  the  trial  at  any  time  or  the  trial  may  be  cancelled.  All  this  must  happen  in  an 
environment  where  the  clinical  staff  and  patients  may  change,  and  a  trial  may  last  months 
or  longer. 

Database  Interface  Workshop, 

Philadelphia,  Oct.  27-30,  1982. 
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Such  a  concept  is  clearly  not  2n  atomic  event,  yet  at  the  same  time  it  is  not  data  either 
since  it  incorporates  dynamic  integrity  constraints  ("the  transactions  corresponding  to 
administering  DrugA  must  precede  the  one  for  DrugB"),  time-triggers  ("one  week  from 
now,  do...''),  user  communication  (e.g.,  form  filling).  The  closest  Programming  Language 
concept  is  that  of  a  process  (eg.,  ADA  tasks)  but  in  our  case  the  number  of  concurrent 
events  may  be  several  orders  of  magnitude  larger  than  that  expected  by  a  general  purpose 
language,  and  users  may  wish  to  express  queries  over  these  events,  as  well  as  the  stored 
data  ("how  many  trials  are  in  stage  3?”).  There  is  thus  no  natural  PL  tool  for  describing 
such  events,  yet  our  experience  indicates  that  they  are  far  more  frequent  than  ordinary 
transactions  for  an  important  class  of  ISs  ([3],  [4]). 

For  this  reason,  the  Taxis  group  (including  John  Mylopoulos.  myself  and  John  Barron)  has 
extended  the  Taxis  language  to  incorporate  the  notion  of  script  ([5])  A  script,  following 
[6]  built  around  a  Petri-net  skeleton  of  states  and  transitions  connected  by  arcs.  At  any 
moment,  one  or  more  states  are  "alive”,  and  transitions  are  possible  if  states  leading  to 
them  are  alive.  However,  transitions  do  not  fire  until  a  "guard"  condition,  written  in  Taxis, 
evaluates  to  true,  and  after  the  transition,  not  only  is  the  set  of  alive  states  updated  but  an 
action  (a  Taxis  transaction  actually)  associated  with  the  transition  is  also  executed.  The 
conditions  may  involve  i)  waiting  for  messages,  ii)  elapsed  or  clock  time,  and  iii)  values  of 
variables,  while  the  actions  may  perform  message  transmission  and  database  updates.  The 
message  passing  also  acts  as  synchronization  mechanism  and  is  based  on  Hoare’s  primitives. 
Scripts  have  local  variables  and  parameters,  which  together  with  states  and  arcs  are  treated 
as  attributes  of  the  script  Thus  scripts  are  incorporated  into  the  Taxis  paradigm  of  classes 
of  objects,  organized  in  subclass  hierarchies,  thereby  allowing  the  user  to  query  script 
classes,  and  allowing  designers  to  follow  the  same  methodology  of  "description  by 
specialization"  for  data,  transaction  and  scripts  ([7])  We  have  also  found  that  the  graphic 
technique  of  describing  scripts  is  very  useful  as  a  requirements  analysis  technique  for  both 
designer  and  end-user. 

There  are  of  course  a  number  of  research  issues  which  must  be  addressed,  including: 

•  complete  implementation  of  scripts  (we  had  an  early  prototype  which  stored 
active  states  and  variables  into  a  database,  and  used  agendas  of  times  and 
message  receipts  to  "wake  up'  idling  scripts); 

•  although  the  idea  of  a  transition  net  is  clearly  desirable,  do  we  need  the  full 
power  of  Petri-nets,  and  if  not  what  is  the  proper  restriction? 

•  what  can  one  do  about  unanticipated  transitions7  (on-line  handling,  better 
techniques  for  anticipating  them  during  design?) 

•  what  is  the  relation  between  scripts  and  dynamic  integrity  constraints  stated  in 
various  logics? 

Exceptions  and  Persistent  Errors 

The  control  over  the  quality  of  the  information  stored  is  one  of  the  prime  concerns  in 
databases  and  considerable  research  has  been  devoted  to  this  problem.  This  control  is 
achieved  through  constraints,  some  of  which  are  inherent  in  the  data  model  being  used, 
while  others  are  explicitly  stated  as  Integrity  Constraints  (ICs)  in  some  special  language, 
which  in  the  extreme  has  the  power  of  FOL. 

The  database  literature  on  ICs  addresses  issues  such  as  constraint  specification,  constraint 
checking  and  violation/failure  actions.  Concerning  the  latter,  the  traditional  view  is  that  a 
violation  leads  to  the  rejection  of  the  update  which  caused  it,  but  the  purely  database 
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research  suffers  from  several  shortcomings: 

•  no  coherent,  complete  proposal  for  handling  such  violations; 

•  an  oversimplification  of  the  circumstances  of  the  violation  (e.g.,  assuming  that 
the  last  update  is  in  error,  rather  than  it  possibly  exposing  previous  errors  in 
the  database) 

•  no  attempt  to  accommodate  in  the  database  certain  information  despite  the 
fact  that  it  violates  some  constraints. 

The  latter  failure  is  especially  troublesome  in  view  of  the  ubiquity  of  exceptions  to  rules 
in  natural  systems,  since  it  ieads  to  incomplete/inaccurate  databases  and  the  maintenance  of 
non-shared  personal  paper  files. 

On  the  other  hand,  PL  research  has  developed  a  coherent  theory  of  exceptions  and 
exception  handlers,  with  concern  for  the  specification  of  exceptions  and  handling 
procedures,  as  well  as  data  and  control  flow  when  exceptions  arise.  Some  of  this 
research  has  been  applied  to  PLs  for  database  programming  (PLAIN,  DIAL,  ADAPLEX,  TAXIS) 
but  both  the  original  and  the  applied  approaches  are  unsatisfactory  because  of 

•  a  lack  of  clear  definition  of  what  an  exception  is  or  what  a  handler  must 
accomplish; 

•  difficulties  of  undoing  side  effects  prior  to  detection  of  an  exception,  of 
dealing  with  persistent  data,  and  of  anticipating  all  possible  exceptions  or 
ways  of  handling  them; 

•  no  consideration  of  interactive  exception  handling  by  the  user. 

The  original  exception  handling  mechanisms  in  Taxis  and  the  extensions  currently  being 
considered  adapt  the  PL  mechanisms  to  databases  by 

•  defining  an  exception  as  the  violation  of  a  constraint 

®  requiring  handlers  to  restore  the  validity  of  the  constraint 

•  allowing  roll-backs,  which  are  independently  motivated  in  databases  from 
concurrency  control  and  recovery,  as  a  way  for  exception  handlers  to  restore 
a  consistent  state 

•  allowing  user  intervention  in  exception  handling  (through  scripts) 
thereby  combining  the  advantages  of  PI  and  DB  approaches. 

The  problem  of  persistent  errors  however  requires  special  consideration,  in  light  of  its 
novelty.  "Exceptional  values"  ("persistent  errors")  which  need  to  be  stored  may  arise  for  a 
number  of  reasons,  including  type  violations  (e.g.,  values  outside  normal  ranges),  estimation 
(eg,  the  value  is  not  known  exactly  or  is  out  of  date  but  the  information  available  is  better 
than  nothing),  problems  with  units  (e.g.,  a  foreign  investment  reported  in  Francs,  not  in 
Dollars),  as  well  as  violations  of  more  complex  quantified  assertions  and  of  constraints 
inherent  in  the  data  model  used  (e.g.,  uniqueness  of  keys). 

For  the  purpose  of  managing  exceptional  values,  we  must  provide  language  facilities  for 

•  creating  and  storing  exceptional  entities 

•  accessing  and  discriminating  between  exceptional  and  unexceptional  objects, 
both  for  users  and  programs 
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®  controlling  the  creation  and  existence  of  exceptions  to  constraints 

These  features  must  also  be  implemented  in  a  way  which  does  not  degrade  the  efficiency 
of  dealing  with  normal  information,  although  the  exceptions  themselves,  as  rare 
occurrences,  may  be  treated  less  efficiently. 

We  mention  here  briefly  some  aspects  of  the  mechanisms  which  are  being  developed  in 
the  context  of  an  object-oriented  semantic  model  such  as  Taxis  [8]: 

•  exceptional  attribute  values  for  objects  are  associated  with  an  excuse,  which 
has  an  authorized  creator,  expiry  date,  and  explanatory  text,  thus  controlling 
the  occurrence  of  exceptions, 

•  excuses  are  classified  into  types  —  both  predefined  and  user-defmed  ones 

(e.g.  ArithmeticException)  which  help  express  the  nature  of  the  exception1 
for  example,  these  types  can  be  used  to  avoid  errors  in  built-in  operations, 
such  as  addition; 

•  there  exist  constructs  which  allow  the  user/programmer  to  filter  out 
exceptional  objects,  as  in 

for  y  in  Employee  such  that  ArithmeticExceptionlSalary  of  y)  do. 

The  last  two  are  especially  important  because  the  database  is  shared:  users  must  navigate 
around  exceptions  allowed  in  the  database  by  others.  In  fact,  an  imDortant  research  issue  is 
making  explicit  in  application  code  all  constraints  which  are  presupposed.  Other  problems 
under  consideration  include 

•  implementation  of  these  mechanisms  both  as  a  package  of  programs  available 
on  top  of  a  DBMS,  and  by  modification  of  the  DBMS; 

•  developing  a  formal  explanation  of  exception  handling  and  exceptional  value; 

•  extending  the  work  to  exception  in  scripts  and  to  other  data  models. 
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Abstract 

We  investigate  the  problem  of  exceptional  or  atypical  occurrences  in  the  context  of 
Information  Systems  (IS),  particularly  those  developed  within  the  framework  of  so-called 
semantic  data  models1'  (eg,  TAXIS,  ADAPLEX,  DIAL,  GALILEO)  By  considering  exception 
handling,  this  work  is  the  logical  continuation  of  previous  research  on  semantic  integrity  in 
databases,  and  at  the  same  time  a  specialization  to  the  case  of  IS  of  the  more  general 
work  on  this  subject  in  programming  languages.  Of  novel  interest  is  the  problem  of 
accommodating,  rather  than  rejecting,  information  and  situations  which  do  not  conform  to 
the  norm 

The  goal  of  this  document  is  to  a)  motivate  the  need  for  flexibility  in  the  way  in  which 
exceptions  are  handled  in  Information  Systems,  b)  survey  the  relevant  literature  in  various 
areas  of  Computer  Science,  c)  consider  in  detail  the  issues  surrounding  the  notion  of  "type' 
and  type  exceptions  d)  present  the  desiderata  of  a  mechanism  for  allowing  exceptions  to 
constraints  to  persist 
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1  Introduction  and  Motivating  Example 

Computerized  information  systems  (ISs  henceforth)  such  as  personnel  file  systems,  credit 
and  billing  systems,  etc.  are  designed  with  the  goal  of  assisting  humans  in  their  tasks  in  the 
business  world  The  primary  goal  for  such  systems  has  in  the  past  been  the  efficient 
storage  and  retrieval  of  information  under  normal  circumstances  One  of  the  distinguishing 
characteristics  of  intelligent  human  behavior  in  the  natural  world  is  FLEX / B/ L/TY:  THE 
ABILITY  TO  DEAL  WITH  UNUSUAL,  UNEXPECTED  OR  ATYPICAL  OCCURRENCES  In  this 
regard  however,  practically  all  current  commercially  developed  software  is  the  exact 
antithesis  of  the  human  system  which  it  replaces  or  assists  programs  are  extremely  rigid 
and  intolerant  of  deviations  from  the  norm  set  out  by  their  designer  As  a  result,  these  ISs 
have  been  unable  to  fully  support  the  activity  of  the  human  users;  in  addition,  many  of  the 
"horror  stories  in  the  popular  press  about  the  misadventures  of  citizens  dealing  with 
computerized  organizations  are  also  directly  attributable  to  the  rigidity  of  the  software  in 
place 

Databases,  and  IS  in  general,  are  important  to  an  organization  because,  among  others,  they 
allow  standardization  of  the  information  form,  thus  promoting  sharing,  and  provide  for  a 
centralized  control  of  the  quality  of  information  by  checking  its  "integrity'.  To  achieve  these 
goals,  current  Data  Base  Management  Systems  (DBMS)  allow  the  organization  to  impose 
through  the  schema  and  integrity  assertions  constraints  on  the  form  and  nature  of  the 
information  stored,  and  more  recently  on  the  access  and  modification  of  this  information. 
These  constraints  are  of  course  based  on  regularities  in  the  real  world  enterprise,  and  in 
addition  to  guarding  against  ubiquitous  data  entry  errors,  they  are  often  exploited  for 
increased  efficiency  in  storage  and  retrieval. 

Current  programming  languages  and  databases  systems  suffer  from  several  problems 
whenit  comes  to  dealing  with  exceptional  cases: 

•  since  there  can  be  many  rare  special  cases,  program  dealing  with  all  of  them 
may  become  impractically  large  and  difficult  to  write, 

•  for  any  reasonably  large  IS,  it  seems  very  hard,  if  not  impossible,  to 
anticipate  all  exceptional  occurrences,  in  particular,  during  the  requirements 
phase,  the  prospective  end-user  will  usually  omit  very  rare  cases,  which  may 
never  have  occured  in  the  past.  In  such  cases,  run-time  intervention  becomes 
necessary  to  handle  some  of  them. 

•  in  order  to  deal  with  some  special  cases,  constraints  intended  to  validate  input 
data  must  be  stated  in  an  unnecessarily  general  manner  (eg.,  because  one  or 
two  people  in  a  company  may  earn  over  $500,000,  all  salaries  are  checked  to 
be  in  the  range  0.. 500000  rather  than  something  more  sensible  like  0.. 99999); 

•  the  code  to  describe  ways  of  dealing  with  special  cases,  even  when  present, 
is  mixed  in  with  the  code  for  normal  cases  making  it  much  harder  to  design 
and  maintain  --  details  of  special  cases  are  not  abstracted  away; 

Briefly  stated,  it  is  the  goal  of  this  research  to  thoroughly  investigate  techniques  which 
facilitate  the  production  of  ISs  that  are  more  flexible  The  techniques  will  take  the  form 
of  programming  language  constructs,  effective  implementation  techniques  for  them,  and 
methodologies  for  their  use 


-  197  - 

We  will  expand  on  our  notion  of  flexibility  -  the  ability  to  deal  with  abnormal  or  atypical 
occurrences  -  by  presenting  next  an  extended  example 


1.1  Examples  of  atypical  occurrences  in  an  Information  System 

Consider  the  case  of  a  hypothetical  utility  company.  We  might  include  the  following 
constraints: 

•  a  customer  has  a  name,  up  to  30  characters  long,  consisting  of  a  surname 

and  up  to  3  initials,  the  customer  also  has  a  Social  Security  Number  (SS#) 

which  uniquely  identifies  him/her.  and  an  address,  which  has  some  specified 
form, 

•  each  customer  is  associated  with  one  power  meter,  from  which  his/her  bills 
are  generated. 

•  a  meter  also  has  an  address,  and  this  must  match  at  least  the  street  address 
of  the  person  who  is  associated  with  it; 

•  a  meter  has  a  last  reading  and  an  associated  date  when  it  was  read,  these 
values  are  positive  and  must  not  decrease  when  updated; 

•  the  bal ance-due  of  a  customer  is  a  non-negative  dollar  amount  and  is 

computed  from  successive  meter  readings, 

•  all  the  above  items  of  information  must  be  known  at  all  times 

•  the  life  cycle  of  a  customer  follows  the  following  pattern:  after  initially 

acquiring  all  necessary  information  about  the  customer,  and  installing  a  meter 
if  necessary,  an  initial  reading  of  the  meter  is  requested  thereafter,  every  2 
months  a  new  reading  of  the  meter  must  be  entered,  followed  by  the  mailing 
of  a  bill,  all  payments  from  the  customer  are  recorded  and  if  the  bill  is  paid 
within  some  required  period,  the  cycle  continues  as  usual:  if  not,  the  customer 
is  sent  reminders,  is  charged  interest  and  eventually,  someone  is  dispatched  to 
disconnect  the  service;  a  customer  may  be  deleted  if  a  signed  request  has 
been  received  to  this  effect  and  if  he/she  has  no  outstanding  debts. 

As  noted  above,  many  of  these  regularities  are  used  to  ensure  that  incorrect  information 
due  to  clerical  errors  does  not  contaminate  the  database  However,  as  paraphrased  by 
Margaret  Fuller  "Nature  provides  exceptions  to  every  rule".  given  enough  time  and  a 
large  enough  population,  such  exceptions  are  bound  to  occur  in  the  great  majority  of  large 
ISs. 

Thus,  in  our  example 

•  in  a  non-homogeneous  society  such  as  that  of  North  America,  variability  in  the 
form  of  names  is  very  large; 

•  newly  arrived  immigrants  or  "guest  workers"  (eg.  professors,  scientists)  usually 
do  not  have  a  SS#  at  the  time  they  start  renting  an  apartment; 

•  a  person  may  leave  for  holidays  and  may  wish  to  have  bills,  incurred  for 
alarms  for  example,  forwarded  to  their  winter  residence  in  Florida; 

•  due  to  a  mechanical  error  (eg.,  incorrectly  connected  meters),  last  month's 
reading  may  be  incorrect  and  therefore  this  month's  reading,  following  a 
repair,  is  in  fact  lower  than  the  previous  month's; 
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•  a  customer  may  send  by  mistake  a  cheque  which  is  larger  than  the  balance- 
due.  thus  resulting  in  a  negative  value  for  this  item, 

•  if  a  meter  is  just  being  installed,  no  reading  may  yet  be  available. 

•  due  to  labor  strife  the  meters  may  not  be  read  for  a  longer  period  than  the 
allowed  60  days  or  payments  may  be  stuck  in  the  mail  system, 


•  in  rare  circumstances,  the  utility  company  may  wish  to  record  some  special 
information  about  a  customer,  such  as  the  fact  that  he/she  has  had  3  meters 
stolen  in  the  last  few  months  and  is  suspected  in  the  thefts. 

The  important  observation  here  is  that  none  of  the  above  irregularities  disqualify  a 
customer  from  receiving  his/her  statement  or  some  important  notice  from  the  utility 
company;  conversely,  a  customer  whose  latest  reading  is  not  known,  should  not  be  sent  a 
bill. 


1.2  Problems  and  outline 

Unfortunately,  the  traditional  response  to  the  variability  noted  above  has  been  to  either 

1.  Omit  most  or  all  rules  as  non-enf orceable  1  Although  this  eliminates  the 
problems  of  exceptions,  it  of  course  also  destroys  the  advantages  claimed 
above  for  a  centralized  DBMS,  especially  the  control  over  the  quality  of 
information  Anticipating  exceptions  may  also  lead  to  generally  inefficient 
implementations  on  current  DBMS:  for  example,  a  field  which  contains 
duplicate  entries,  no  matter  how  infrequently,  cannot  be  used  as  a  key  in  most 
databases,  thus  requiring  a  new  artificial  key 

2  Exclude  offending'1  information  from  the  database  or  "normalize'  it  in  order  to 
fit  the  mold  This  however  leads  to  an  information  base  which  is  inaccurate, 
incomplete  and  out  of  date,  it  also  often  results  in  the  maintenance  of  parallel 
manual  files  which  are  much  more  volatile,  no  longer  easily  shared,  and 
outside  the  control  of  the  management. 

The  above  examples,  and  some  of  the  disadvantages  of  record-based  systems  noted  in 

[Kent  79],  clearly  illustrate  the  need  for  the  ability  to  occasionally  store  information  or 
perform  operations  which  do  not  conform  to  the  norm,  yet  at  the  same  time  maintain 
control  over  the  atypical  situations  and  not  affect  adversely  the  efficiency  of  dealing  with 
the  vast  majority  of  ordinary  cases 

These  capabilities  are  quite  specific,  and  we  feel,  need  to  be  considered  within  the 
broader  context  of  exception  handling  in  an  IS.  thus  allowing  for  such  actions  as  repairing 
errors,  aborting  transactions,  interrogating  users,  etc.  It  will  therefore  be  important  to 
review  the  relevant  contributions  to  this  problem  in  diverse  areas  such  as  Programming 
Languages,  Databases  and  Artificial  Intelligence. 

In  particular,  we  will  survey  in  Section  2  the  various  techniques  for  exception  handling 


An  example  of  this  is  the  computerized  information  system  at  Queens  University,  where  the  only  rule 
enforced  for  students  enrolling  in  a  course  is  that  the  course  be  offered  [  Jardine  81]. 
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provided  in  general  purpose  Programming  Languages  and  in  programming  languages  for  IS 
design  respectively.  In  further  subsections  we  will  take  a  closer  look  at  the  problem  of 
Integrity  Constraints  in  databases,  and  the  way  in  which  Al  research  has  addressed  the  issue 
of  exceptional  occurrences 

In  turn,  the  notion  of  atypical'  or  unexpected  occurrence  makes  sense  only  once  the 
normal  or  expected  cases  have  been  described  For  this  reason,  we  will  then  describe  a 
conceptual  framework  for  specifying  ISs  and  the  concomitant  notion  of  'exception".  We 
will  take  particular  interest  in  the  way  in  which  the  concept  of  "type  constrains  the  values 
to  be  stored  in  the  database,  the  advantages  of  typed  languages  and  the  problems 
encountered  when  type  rules  are  allowed  to  be  broken 


2  Related  Research 


2.1  Exception  Handling  in  General  Programming  Languages 

The  following  is  a  rationalized  summary  of  the  important  contributions  to  our 
understanding  of  exceptional  situations  according  to  recent  research  in  Programming 
Languages  (  [Goodenough  75],  CLU  [Liskov  &  Snyder  79],  [Levin  77]  ADA  [Ichbiah  et  al. 
79],  [Cristian  82]). 

In  general,  in  dealing  with  exceptions  one  must  address  the  issues  of  exceptional  situation 
detection,  transmission,  diagnosis  and  recovery  In  Programming  Language  research,  a 

sketchy  view  of  this  process  is  that  one  ’procedure”  (the  caller)  makes  use  of  ( invokes ) 
another  procedure  (the  signaler)  in  order  to  perform  some  task.  During  the  execution  of 
the  signaler,  an  exceptional  situation  may  be  detected,  this  results  in  an  exception  instance 
being  raised,  and  control  being  diverted  to  a  piece  of  program  text  (the  handler) 
associated  with  that  exception  The  following  is  a  checklist  of  issues  which  most  proposals 
have  addressed,  together  with  an  indication  of  the  range  of  answers  provided: 

1.  Definitional  issues 

•  What  is  an  exception?  An  exceptional  situation  may  be  one  where  an 

operation  is  terminated  due  to  an  error  (ADA,  CLU),  or  more  generally,  any 
situation  brought  to  the  attention  of  the  invoker  of  an  operation 
(  [Goodenough  75],  [Levin  77]). 

•  How  is  an  exception  specified?  An  exception  is  a  special  type  of  object  in  a 
programming  language  (like  a  variable  or  procedure)  whose  identifier  is 
explicitly  declared,  usually  as  part  of  the  definition  of  a  procedure  or  module. 

•  What  is  a  handler?  The  handler  is  by  definition  the  program  text  to  be 

executed  when  some  particular  exception  instance  is  signalled  Levin(  [Levin 

77])  and  Cristianf  [Cristian  82])  provide  in  addition  a  formal  specification  of 

what  a  handler  is  expected  to  accomplish  using  pre  and  post  condition 
assertions. 


This  often  means  only  a  textual  unit!! 
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•  How  is  a  handler  specified7  A  handler  H  is  usually  syntactically  attached  to 
some  program  text  unit  S  which  contains  an  i nvocation  of  a  signaler  that  can 
raise  exception  C  (eg  S  [  C  =>  H]  );  the  gram  of  the  text  unit  varies  from 
as  small  as  an  expression  (  [Levin  77])  to  as  large  as  a 
block/procedure/module  (  [Ichbiah  et  al.  79]).  Note  that  the  same  exception 
condition  (NOT  any  particular  instance)  may  have  different  handlers  associated 
to  it  at  different  places  so  that  different  invocations  of  the  same  signaller 
may  have  different  handlers  associated  with  them  Even  one  instance  of  an 
exception  may  have  several  potentially  competing,  handlers  associated  with  it. 
as  in  the  case  of  a  nested  block  structure,  where  with  each  block  a  handler 
may  be  specified  for  the  exceptions  arising  in  it. 

2  Control  Flow  Issues 

•  How  is  an  exception  raised7  Languages  offer  some  variant  of  a  signal  or 
raise  statement  which  can  appear  anywhere  other  statements  appear. 

•  Which  handler  is  executed7  Most  proposals  use  some  notion  of  'visibility' 
based  on  lexical  block  or  procedure  nesting  structure.  There  is  a  close 
resemblance  to  variable  visibility  rules  in  Algol  —  like  languages,  if  one  considers 
some  invocation  of  a  signaller  which  raises  E  as  "the  attempt  to  access 
variable  E’  and  the  various  syntactic  handler  specifications  S|  [E  =>  H^]  as 

declarations  and  initializations  of  E".  Therefore,  given  a  group  of  procedures, 
only  those  on  the  calling  hierarchy"  of  the  actual  signaling  invocation  may 
propose  handlers,  and  normally  the  innermost  one  is  selected  Levin  has  made 
an  additional  important  distinction  between  flow  exceptions,  which  occur  in 
attempting  to  carry  out  some  specific  independent  operation,  and  structural 
exceptions  involving  data  structures  that  may  be  shared  by  several  active 

program  contexts.  In  the  later  case,  all  users  of  the  particular  data  structure 
should  have  a  say  about  what  to  do  in  case  of  an  exception  related  to  it 
For  this  purpose,  Levin  allows  for  the  parallel  or  sequential  execution  of  the 
visible  handlers. 

•  What  happens  when  the  calling  procedure  does  not  provide  a  handler  for  an 

exception?  In  ADA.  the  exception  is  simply  propagated  up  the  calling 
hierarchy  until  a  handler  is  located.  In  [Liskov  &  Snyder  79]  it  is  argued  that 
this  destroys  the  procedural  abstraction,  and  therefore  unhandled  exceptions 
are  converted  to  a  special  exception  FAILURE,  which  is  then  passed  up  the 
calling  hierarchy  In  addition,  almost  all  proposals  recognize  the  potential 
burden  of  specifying  handlers  for  myriads  of  exceptions.  For 

example, Goodenough(  [Goodenough  75])  allows  a  default  handler  to  be 
attached  to  the  exception  definition  Also,  most  languages  offer  a  case 
construct  with  an  else  branch  (eg,  when  others:  h  end;)  allowing  handlers  to 
be  given  for  all  exceptions  not  explicitly  named 

•  How  does  flow  of  control  continue  after  the  handler  terminates?  According 
to  the  term! nation  model  (CLU,  ADA),  the  signaler  ceases  to  exist  (i.e.  is 
aborted)  once  it  raises  the  exception,  and  the  handler  is  considered  to  have 
completed  the  statement  to  which  it  was  attached.  The  resumption  model 
(  [Levin  77])  suggests  that  the  signaler  be  resumed  after  the  handler 


This  problem  arises  if  one  allows  parallel  processing  or  if  the  language  provides  modules,  one  of  which 
may  be  shared  and  active,  without  any  of  its  operations  currently  executing  (e.g.,  a  storage  allocation  module). 
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completes  execution,  although  the  handler  may  suggest  a  label  indicating  where 
execution  continues  Goodenough  supports  both  views  by  classifying  an 
exception  as  being  either  an  escape  or  a  notify  type  (  a  signal  type  can  be 
either).  Within  the  resumption  model.  Levin  provides  an  option  between  a 
wait  policy,  where  the  signaler  continues  only  when  the  exception  handler  has 
completed,  and  a  no-wait  policy,  where  the  handler  is  assumed  to  be 
executing  in  parallel  with  the  resumed  signaler.  As  suggested  in  [Liskov79], 
the  resumption  model  is  strictly  more  powerful  than  the  termination  model, 
but  unfortunately  entails  a  more  complex  implementation  and  further  damages 
the  procedural  abstraction  by  requiring  the  handler  procedure  to  know  about 
the  continuation  of  the  signaler 

3  Data  flow  issues 

•  What  information  is  available  when  executing  the  handler?  ADA  allows  no 
information  to  be  passed  between  the  caller  and  signaler,  thus  making  global 
variables  the  only  method  of  communication  In  [Goodenough  75]  and 

[Liskov  &  Snyder  79],  exceptions  are  allowed  to  pass  out  information 
through  result  parameters,  while  Levin's  exceptions  resemble  function  calls  in 
the  sense  that  information  can  be  passed  in  and  out  through  the  parameters 
of  an  exception 

•  How  can  one  restore  the  variables  to  a  consistent  set  of  values  after 
detecting  an  error?  All  of  the  proposals  rely  on  the  user  to  perform  such 
recovery  actions  as  may  be  necessary,  presumably  in  part  because  the  cost 
of  providing  a  built-in  back-up  operation  appears  to  be  prohibitive 
Goodenough,  for  example,  provides  for  cleanup  actions  before  resuming  a 
signaler. 


1  Exception  Handling  in  Database  Languages 

Most  application  programs  for  data  bases  are  written  in  some  host  language  (often  Cobol 
or  PL/I),  from  which  data  base  manipulation  primitives  are  invoked.  Probably  the  most 
widely-used  exception  handling  facility  is  that  provided  by  the  PL/I  ON-condition,  which 
was  a  precursor  of  most  recent  proposals  but  which  has  several  weaknesses  (  [Levin  77] ) 
it  is  a  form  of  non-local  GO  TO,  the  handler  program  has  peculiar  restrictions,  and  it  relies 
on  global  variables  for  passing  information  between  contexts. 

Three  languages,  PLAIN  (  [Wasserman  77]),  TAXIS(  [Mylopoulos  et  al  80]),  and  DIAL 
(  [Hammer  &  Berkowitz  80]),  designed  specifically  to  allow  programming  IS  also  offer 

4 

exception  handling  facilities 

Among  the  novel  features  of  PLAIN  are  (i)  exception  handlers  which  are  single 
procedures,  (ii)  a  third  option  in  flow  control:  retrying  the  signaler  after  exception  handling, 
(iii)  a  special  "terminal"  exception  (FAIL)  which  must  be  propagated  up,  although  each 
procedure  may  specify  a  cleanup  action  before  propagation. 

The  description  of  exception  handling  in  [Hammer  &  Berkowitz  80]  suggests  that  in  DIAL 


4 

The  conceptual  modeling  languages  ADAPLEX  (  [Smith  et  al  81])  and  GALILEO  (  [Albano83])  offer 
exception-handling  facilities  inherited  from  their  ancestors,  ADA  and  ML  respectively. 
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exceptions  are  used  as  alternative  communication  mechanisms;  for  example,  one  may  use  a 
predicate  resembling  ls-signalled?{E)  as  part  of  a  conditional  expression  controlling  the 
execution  of  a  loop  which  calls  in  its  body  a  procedure  that  can  signal  E. 

TAXIS  exception  handling  mechanism  shares  procedurality  with  PLAIN  but  associates 

exceptions  only  to  prerequisite  conditions  guarding  the  use  of  transactions  Since  TAXIS  is 

an  object-oriented  language,  exceptions  are  considered  to  be  objects  also,  but  they  can 

only  be  raised  by  the  failure  of  a  prerequisite  assetion  for  some  transaction  or  by  built-in 
operators  of  the  language  As  in  PLAIN,  exception  handlers  are  supposed  to  be 
transactions 

Although  not  an  actual  programming  language,  we  should  mention  here  the  facilities 

provided  by  the  Office  Specification  Language  defined  in  [Kunin  82]  Significantly,  Kumn 
recognizes  the  need  for  separating  out  the  description  of  what  happens  normally  from 
what  occurs  in  special  or  exceptional  situations,  as  a  technique  for  managing  and  organizing 
the  overwhelming  mass  of  details  in  an  office  procedure  In  the  OSL  scheme  of  things, 
each  OSL  primitive  activity  has  associated  specific  exceptions  which  it  can  raise,  and,  most 
interestingly,  there  are  specific  exceptions  associated  with  the  violation  of  timing 
constraints  These  constraints  form  a  subset  of  the  "dynamic  integrity  constraints"  needed 
for  databases 


2.2  Problems  associated  with  traditional  Exception  Handling  techniques 

Although  substantial  progress  has  been  made  tn  our  understanding  of  the  issues  and 
alternative  strategies  available,  from  our  point  of  view  previous  Programming  Language 
proposals  suffer  from  the  following  shortcomings 

•  The  notion  of  exception"  is  extremely  loose  and  overly  general,  ranging  from 

"error  condition"  to  "special  situation",  and  is  often  used  as  a  normal 

programming  technique  (e  g,  [Goodenough  75]). 

•  There  is  no  clear  description  of  the  task  that  a  handler  must  perform 

•  When  flow  of  control  returns  to  the  signaler,  procedural  abstraction  is  broken 

down,  as  noted  in  [Liskov  &  Snyder  79]  If,  however,  the  signaler  is 

terminated  upon  raising  an  exception,  it  must  presumably  first  clean  up  all  its 
side  effects  before  passing  control  to  the  handler;  yet  this  task  seems  to 
logically  belong  in  the  exception  handler:  what  to  do  if  something  goes 
wrong?  In  both  cases,  undoing  side-effects  is  a  potentially  tedious  and  error- 
prone  task, 

•  Even  in  the  area  of  programming  languages  for  IS  design,  no  effort  has  been 

made  to  take  advantage  of  the  fact  that  the  programs  developed  are 

Information  Systems  -  a  restricted  subclass  of  all  possible  computer  systems, 
with  special  characteristics.  Two  such  characteristics  are  the  persistence  of 
data  well  beyond  the  end  of  the  current  program  invocation,  and  the 
presence,  in  interactive  IS,  of  humans  at  execution  time 

•  Although  Levin  has  considered  the  issue  of  allowing  a  signaling  procedure  to 

continue  even  after  it  has  raised  an  exception,  this  approach  was  not 

considered  for  the  case  when  the  exception  is  raised  by  one  fo  the  language 
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primitives  (eg,  an  assignment  statement  which  violates  run-time  type  checking, 
overflow,  etc) 


2.3  Integrity  Constraints  in  Data  Bases 

The  problem  of  maintaining  a  database  free  of  semantic  errors,  arising  through  user 

carelessness  or  lack  of  knowledge,  has  received  considerable  attention,  with  [Hammer  & 
McLeod  75],  [Eswaran  &  Chamberlin  75]  and  [Brodie  78]  being  notable  early 

contributions.  The  problem  is  often  considered  in  3  parts 

1.  Constraint  specification:  A  number  of  Data  Def imtion/Manipulation  Languages  in  the 

literature  (  [Stonebraker  75],  [Chamberlin  et  al  75],  [Smith  et  al  81])  offer  extensions 
for  the  specification  of  semantic  integrity  constraints  In  addition.  [Hammer  &  McLeod 
75],  [Brodie  78]  and  [Wilson  80]  provide  special  languages  for  this  purpose.  Generally, 
these  languages  allow  one  to  express  constraints  roughly  equivalent  to  First  Order  Logic, 
and  integrity  constraints  are  often  classified  along  a  number  of  dimensions,  for  reasons 
which  are  of  importance  for  checking  purposes. 

2  Constraint  checking.  Probably  most  effort  has  been  invested  in  this  aspect  of  the 

integrity  problem  and  we  content  ourselves  with  providing  a  list  of  references  to  the  some 
notable  papers  in  the  area:  [Stonebraker  75],  [Buneman],  [Nicolas  79],  [Sarin  77], 

[Bernstein  et  al.  80],  [Koenig  &  Paige  81],  [Blaustein  81] 

3.  Violation  or  failure  actions.  In  addition  to  rejecting  the  update  and  possibly  returning 
an  error  code,  the  following  options  have  been  mentioned  (  [Hammer  &  McLeod  75], 

[Eswaran  &  Chamberlin  75])  (i)  substituting  a  new  value  for  the  erroneous  one  and 

rechecking,  (ii)  calling  an  external  procedure,  (iii)  notifying  an  authority,  (iv)  providing  a  list  of 
the  records  which  caused  the  current  violation  of  a  "tuple'  assertion  and  (v)  backing  out 
changes  to  a  previous  consistent  state  As  an  example  of  the  latter,  the  System  R 
recovery  manager  (  [Gray  et  al  81])  provides  commands  to  SAVE  the  state  of  a  file  as 
well  as  UNDOing  all  changes  to  it  from  a  previous  SAVE  these  commands  can  be  used  as 
part  of  application  programs  An  important  point  here  is  that  this  exception  handling  facility 
is  independently  needed  for  hardware  failure  recovery  and  consistency  maintenance  in  face 
of  concurrency,  and  hence  in  some  sense  comes  at  no  extra  cost. 


2.4  Problems  with  exception  handling  in  current  Databases 

After  a  promising  start,  little  advance  has  been  made  in  the  past  half  decade  on  the 
problem  of  handling  integrity  exceptions,  and  the  original  work  suffered  from  a  number  of 
shortcomings: 

•  None  of  the  proposals  produced  a  specific  complete  exception  handling 
mechanism  which  integrated  with  the  application  language  in  the  manner  of 
traditional  Programming  Languages  In  [Eswaran  &  Chamberlin  75],  there  is 
only  a  discussion  of  a  list  of  disparate  issues  related  to  integrity  constraints, 
while  the  proposal  in  [Hammer  &  McLeod  75]  does  not,  for  example, 
support  multiple  handlers  for  the  same  constraint  —  depending  on  when  the 
violation  occurred--,  and  failed  to  allow  for  roll-back. 
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•  More  importantly,  none  of  the  proposals  allow  one  to  fully  accommodate 

exceptions  to  integrity  assertions  of  the  type  illustrated  in  Section  1.  Although 

[Hammer  &  McLeod  75]  recognized  the  need  for  "soft"  rules,  which  give 

rise  only  to  warnings,  no  provisions  were  made  for  storing  and  later 

manipulating  such  exceptional  information  Also,  no  one  recognized  the 
problems  raised  by  the  presence  of  an  inconsistency  between  the  general 
constraint  and  some  specific  fact,  which  leads  to  logical  chaos. 

•  Since  the  schema  and  other  constraints  inherent  in  data  models  were  not 

considered  as  Semantic  Constraints,  no  one  has  addressed  the  problem  of 
allowing  violations  to  such  constraints  to  persist. 

•  The  issue  of  what  is  'erroneous'  information  has  not  been  considered  in 
adequate  detail.  When  inserting  a  record  x  whose  attribute  p  is  required  to  be 
identical  to  attribute  q  of  some  other  record  y,  it  is  not  obvious  that  x.p, 
rather  than  y.q.  is  to  blame  for  the  failure  of  this  constraint,  as  is  implied  in 

[Eswaran  &  Chamberlin  75] 

•  Although  the  importance  of  state  transition  assertions  has  been  recognized, 
these  appear  to  be  relatively  difficult  to  state  independently  and  may  be 
expressed  more  conveniently  as  pre/post  conditions  on  the  transactions 
modifying  the  database  Furthermore,  as  illustrated  in  the  example  of  Section 
1,  a  complete  IS  may  incorporate  constraints  on  possible  sequences  of 
actions,  including  communication  patterns  with  the  IS  user.  Because  of  the 
emphasis  on  data,  the  problem  of  violation  actions  for  these  types  of 
constraints  has  not  been  considered  in  database  research 


2.5  Exceptions  in  Artificial  Intelligence  Research 

In  attempting  to  simulate  intelligent  behavior,  researchers  in  Machine  Vision,  Natural 
Language  Understanding  and  other  areas  of  Al  have  been  forced  to  provide  the  machine 
with  generic  information  about  the  world,  which  can  be  used  to  comprehend  specific 
situations  that  the  computer  might  encounter.  The  problem  of  exceptions  arose  quite  early 
in  this  context  since  everyday  reality  involves  mostly  natural  kinds  categories  or  terms 
which,  unlike  mathematics,  do  not  have  definitions  in  terms  of  necessary  and  sufficient 
conditions.  Thus,  the  information  that  "Canaries  are  yellow"  must  co-exit's  with  the  specific 
facts  that  "Tweety  is  a  canary"  and  "Tweety  is  green".  As  a  result,  most  representations 
adopted  a  "prototype"  or  "default"  approach  to  descriptions:  "if  it  is  consistent  to  do  so, 
assume  that  a  canary  is  yellow"  Reasoning  with  such  default  rules  has  been  formally 
studied  in  Al  under  the  rubric  of  " non-monotonic  logics"  (see  [Al  80]  for  a  milestone 
compendium  of  papers)  and  there  are  now  several  schemes  which  allow  logical  deduction 
in  the  presence  of  seemingly  contradictory  default  rules  and  specific  facts.  As  we  have 
argued,  most  rules  have  exceptions  in  reality,  so  in  a  database  environment  most  rules 
would  have  to  be  default  ones,  with  the  de  facto  assumption  that  specific  facts  are 
correct.  Unfortunately  this  would  mean  that  such  constraints  cannot  be  used  to  guard  the 
integrity  of  the  database  from  erroneous  updates. 

It  would  seem  that  we  require  a  mechanism  which  detects  inconsistencies  in  updates,  but, 
if  desired,  modifies  the  set  of  formulas  to  restore  consistency  in  order  to  accommodate 
the  new  facts  The  Truth-Maintenance  System  [Doyle  79]  provides  some  mechanisms  for 
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this  by  keeping  j  ust/  f  /  cat  ions  with  all  information,  so  that  inferences  from  no-ionger  held 
beliefs  (eg,  default  ones  just  contradicted)  can  be  undone  However,  in  TMS  formulas  are 
accepted  or  rejected  in  lotto,  without  attempting  keep  a  modified  version  of  them 
Kowalski  [Kowalski  78]  has  also  illustrated  a  way  in  which  a  general  rule  in  a  PROLOG 
logic  base  can  be  modified  to  accommodate  an  "exceptional  fact'  (e.g.,  all  first  year 
courses  are  held  in  Room  145  and  CompScilOO  is  a  first  year  course  held  in  Room  139). 
Unfortunately,  no  general  algorithm  has  been  provided  for  doing  this,  and  it  remains  entirely 
unclear  what  are  the  constraints  on  such  modifications  (eg. why  isn't  the  constraint  modified 
to  be  everywhere  TRUE?) 

A  different,  non-formal  approach  to  the  problem  of  inaccurate  generic  descriptions  was 
first  proposed  in  [Minsky  74]  and  considerably  elaborated  in  [Lesperance  80],  it  suggests 
that  all  exceptions  (contradictions  between  concept  and  instance  descriptions)  must  be 
explainable  by  the  interaction  of  possibly  conflicting  concept  descriptions.  Thus,  a  PERSON 
is  expected  to  have  2  legs,  and  an  AMPUTEE  will  have  fewer  than  4  limbs,  if  now  the 

main  character  of  another  class,  such  as  PIRATE _ STORY,  is  required  to  be  a  PERSON,  then 

Captain  Kids  wooden  leg  can  be  excused  by  his  presence  as  an  instance  of  the  class 
AMPUTEE.  It  is  significant  to  note  that  excuses  here  are  context  dependent,  in  the  manner 
of  exception  handlers  in  Programming  Languages,  the  context  being  the  so-called  PART-OF 
hierarchy.  Unfortunately,  in  the  Information  System  domain  this  approach  appears  to 
require  a  comprehensive  list  of  near-miss  classes,  such  as  AMPUTEE,  which  would  be  a 
great  burden  on  designers,  users  and  memory  Our  proposed  research  attempts  to  alleviate 
such  problems  by  adopting  the  view  that  users  provide  excuses 

We  mention  here  that  a  number  of  researchers  have  considered  the  problem  of 
conflicting  concept  descriptions  (eg.  "BIRDs  can  fly",  "PENGUINs  are  BIRDs”,  "PENGUINS 
cannot  fly")  and  have  developed  similarity  mechanisms  as  an  alternative  to  default 
reasoning;  [Lesperance  80]  is  again  a  good  reference  to  this  area. 

Exceptions  have  also  been  considered  in  the  context  of  stereotyped  event  sequence 
descriptions  in  Al  In  particular,  Schank  and  Abelson  (  [Schank  &  Abelson  75])  consider 
exceptional  developments  which  can  interfere  with  the  computer  "following  a  script",  such 
as  eating  in  a  restaurant  Such  occurrences  as  obstacles  (unsatisfied  preconditions  for 
subevents),  errors  (unaccomplished  goals  of  subevents)  and  distractions  cause  the  computer 
to  shift  attention  to  error-handling  What- if  scripts  that  set  up  new  expectations 
Unfortunately  the  "exception  handlers"  are  themselves  only  stereotypical  and  eventually 
recourse  may  have  to  be  made  to  a  general  planning  mechanism  (which  is  not  present  in 
databases),  and  in  the  above  proposal  the  handlers  are  independent  of  the  more  general 
context. 


3  Conceptual  Modelling:  a  Context  for  Research 

An  IS  is  in  some  important  sense  a  model  of  the  world  about  which  it  contains 
information.  The  world  is  seen  as  populated  by  objects,  which  are  inter-related  through 
properties,  and  which  participate  in  activities  The  designer  controls  the  quality  of 
information  stored  in  an  IS  by  describing  generic  classes  of  objects  and  insisting  that  every 
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specific  object  belong  to  at  least  one  class  Similarly  the  transactions  of  the  IS  are 
intended  to  model  the  activities  normally  carried  out  in  the  enterprise  including  information 
gathering  and  displaying. 

Each  class  of  objects  is  described  in  terms  of  the  properties  attributable  to  its  instances 
together  with  constraints  on  the  possible  values  of  these  properties  Some  of  these 
constraints,  such  as  the  range  of  values  which  some  property  may  take  on,  are  often 
expressed  in  a  special  syntax  by  reason  of  their  frequent  occurrences,  other  constraints 
are  left  to  be  expressed  in  some  general  formal  notation,  in  this  case  based  on  First  Order 
Logic  All  constraints  will  be  classified  as  being  initial,  final  or  universal,  depending  on 
whether  they  are  expected  to  hold  at  the  time  an  object  is  inserted  into,  deleted  from  or 
throughout  its  lifetime  as  an  instance  of  that  class  In  addition,  the  checking  of  some 
constraints  can  be  postponed  until  the  end  of  each  transaction  (i.e..  during  transactions,  the 
constraint  may  be  violated). 

Figure  1  contains  a  partial  description  of  the  class  CUSTOMER  for  our  hypothetical  utility 
company. 

data  class  CUSTOM E R  with 
attributes: 

name:  N  AM  E  y  ALU  E ; 

mai  /  i  ngjadd  ress:  ADDRESS; 

meter:  METER: 

bat  ance_d  ue:$  AMOUNT  ; 

rate:  $  AMOUNT  ; 

initially 

start-clean:  ( bat  ance  _due-  0 If 

finally 

no_debt:  I  this.bal ancejdue-  0 1/ 

always 

knownSS:  I this.SS #  nothing j/ 

keySS:  (\/z  e  CUSTOME R)(this.SS #  =  z.SS #  =>  this=z)f 
end  CUSTOMER; 

Figure  1:  The  CUSTOMER  data  class 

In  Semantic  Models,  data  classes  are  often  organized  in  a  subclass  hierarchy  (eg. 

INDUSTRIAL CUSTOMER  IsA  CUSTOMER)  where  the  definition  of  the  subclass  need  not 

duplicate  material  specified  in  the  super  class's  definition  it  is  enough  to  show  additional 
attribute  definitions  or  strengthened  versions  of  old  constraints. 

Given  a  collection  of  such  class  definitions  provided  by  the  IS  designers,  information  can 
be  recorded  and  modified  only  by  manipulating  objects  using  primitive  operations  such  as 
insert  x  in  CUSTOMER  with  name:...,  SS#:...,  ...; 


modify  y.rate  to  S2.50; 
remove  y  from  CUSTOMER; 
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It  is  these  primitive  operations  that  will  detect  inconsistencies  between  the  generic 
information  of  the  data  class  description  and  the  specific  information  being  stored 

Information  in  the  database  can  be  obtained  through  the  use  of  retrieval  operators  such 
as 

get  y  from  CUSTOMER  with  SS#  789789789 

evaluate  amount _ due  attribute  of  y  /*  syntax:  y.amount _ due  */ 

For  reasons  of  uniformity,  we  choose  to  view  transaction  definitions  as  classes  as  well, 
with  properties  specifying  the  participants  (parameters),  and  constraints  on  the  proper 
instantiation  (invocation)  of  the  transaction  These  constraints  are  either  pre-cond itions, 
post- conditions,  i  nter  rn/ttent  assertions  or  /  nvariants  depending  on  whether  they  are 
expected  to  hold  at  the  time  the  transaction  is  initiated,  terminated,  at  some  specific  point 
or  throughout  its  execution  In  addition,  the  body  of  the  transaction  specifies  the  actions 
which  must  be  taken  The  transaction  corresponding  to  the  reading  of  a  meter  is  illustrated 
in  Figure  2. 

transaction  class  MET ER _RE AD  with 

parameters/  (  mtr,newread i ng,  newdate) 
variables 

mtr:  METER; 
newread/ng:  0..100000; 
newdate:  DATE; 
preconditions 

consumed :  ( new  read  i  ng  >-  mtr. lastreading)! 
later;  t  newdate  >-  mtr.l astread i ngdatel! 

actions 

modify  mtr.l  astread  i  ng  to  new  read  i  ng; 
modify  mtr .1  astread i ngd ate  to  newdate; 
end  MET  ER  _RE  AD 

Figure  2:  The  METER-READ  transaction 

A  transaction  can  be  invoked  and  terminates  in  the  manner  of  procedures  in  ordinary 
programming  languages,  with  the  important  additional  requirement  that  its  execution  can  be 
regarded  as  atomic,  and  "guarded1  by  the  assertions,  in  the  sense  that  changes  are  not 
committed  if  some  condition  is  violated. 


4  Classifications  of  constraints 

Traditionally,  constraints  have  been  categorized  as  being  static  —  applicable  to  a  single 
state  of  the  IS  --  or  dynamic  —  constraining  transitions  between  states.  Constraints  have 
also  been  variously  classified  according  to  syntactic  criteria,  in  work  dealing  with  efficient 
checking  of  certain  subclasses  of  constraints. 

Here,  we  are  interested  in  evaluating  constraints  from  the  point  of  view  of  the  problems 
and  costs  of  allowing  them  to  be  violated 

To  begin  with,  let  us  consider  explicitly  the  source  of  all  constraints  which  can  be 
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imposed  on  a  world  model  in  TAXIS  First,  there  are  the  constraints  inherent  in  the 
definition  of  the  description  language  itself  The  following  rules  apply  to  entities  stored  in 
the  database 

•  every,  object  must  belong  to  some  class 

•  an  object  may  have  a  property  which  has  a  single  value,  and  is  an  object 

itself  (i.e ,  properties  are  functions) 

•  the  definition  of  a  class  specifies  the  properties  applicable  to  its  instances 

•  the  definition  of  a  class  associates  a  class  with  each  property  defined  (with 

the  intent  that  this  class  specify  the  range  of  values  allowed  for  that 

property) 

•  all  instances  of  a  class  are  instances  of  its  super  classes 

•  an  object  can  only  have  those  attributes  which  are  allowed  by  at  least  one  of 
its  classes 

Then  there  are  the  external  semantic  constraints  which  ensure  that 

•  the  attribute  values  of  a  specific  object  must  belong  to  the  class  specified  in 

the  property  definition  associated  with  the  least  class  to  which  the  object 

belongs  ("type  constraints'  on  properties) 

•  the  values  of  key  attributes  for  a  class  uniquely  identify  an  object  in  that 

class 

•  all  logical  assertions  attached  to  an  object  are  true  at  the  appropriate  times 

Similarly,  there  are  inherent  constraints  associated  with  the  operations  of  the  system. 

•  one  cannot  invoke  a  non-existent  operator,  or  one  with  an  inappropriate 
number  of  operands  (syntactic  condition) 

•  one  cannot  access  the  elements  of  a  non-existent  class  (insert,  get,  remove), 
manipulate  a  non-existent  object  (modify,  remove,  evaluate),  or  qualify  by  a 
non-existent  attribute  (insert,  modify,  get.  evaluate) 

•  an  object  cannot  be  deleted  if  there  are  still  references  to  it  (remove) 

•  arithmetic  and  string  operators  and  comparators  are  defined  only  for  their 
usual  domains  (problems  include  mixing  types,  overflow,  division  by  0,  etc.) 

•  equality  is  the  only  operator  which  can  be  used  to  compare  any  two  values 

•  violations  of  the  semantic  constraints  on  data  objects  are  detected  by  the 
primitive  operators 

Explicit  constraints  on  operations  can  be  expressed  by 

•  specifying  an  attribute  of  a  class  to  be  non-modifiable 

•  constraining  the  entities  which  can  be  associated  with  a  transaction  such  as 
parameters  and  local  variables  ("type  constraints"  for  variables  in  transactions) 

•  stating  pre-requisites,  and  other  assertions  spread  through  out  programmer- 
defined  transactions 
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Considering  the  use  made  of  assertions  in  an  Information  Systems,  we  distinguish  two 
important  classes  of  constraints  policy  constraints  and  essential  constraints  A  constraint  is 
considered  to  be  essential  if  and  only  if  allowing  it  to  be  violated  may  lead  to  the 

incorrect  operation  of  the  Information  System  at  some  future  date  For  example,  the 

constraint  that  an  employee  should  not  earn  less  than  his  manager,  or  that  his  salary  cannot 

decrease,  would  normally  be  policy  constraints,  since  if  in  fact  the  salary  was  reduced,  no 
new  problems  would  arise  in  the  use  of  the  IS  On  the  other  hand,  violating  the  rule  that 

the  meter  reading  must  be  some  integer  (by  storing  no  value  or  by  storing  a  value  of  a 

different  type)  may  lead  to  arithmetic  operations  on  undefined  values  and  hence  produce 
unexpected  nonsensical  results;  or,  violating  the  constraint  that  key  property  values  must  be 
unique,  will  lead  to  nondeterminism  in  retrieving  an  object  with  get.  where  this 

nondeterminism  was  not  expected,  and  hence  will  likely  lead  to  erroneous  results  Hence 
such  constraints  are  essential  At  first  sight  it  would  seem  that  the  only  non-essential 

constraints  are  those  stated  as  first  order  assertions  for  data  classes  or  as  pre-requisites 
in  transactions  Unfortunately  this  is  not  correct1  Suppose  that  there  is  an  Integrity 
Constraint  of  the  form  3x</>(x),  which  is  supposed  to  hold  at  all  times;  a  programmer  might 
use  this  knowledge  to  assume  that  the  query  get  y  such  that  <p(y)  will  always  return  an 
object,  and  hence  not  include  code  to  check  if  y  is  nothing  If  one  then  tries  to  evaluate 
y.p,  the  result  will  be  a  violation  of  an  inherent  constraint  of  TAXIS.  Conversely,  not  all 
"type  constraints',  which  limit  the  range  of  values  for  a  property,  need  to  be  essential 

constraints.  For  example,  the  constraint 

EMPLOYEE.. salary  =  10000:50000 

seems  to  be  in  part  a  policy  constraint  (EMPLOYEE.. salary=SVALUE  is  more  essential). 

Although  the  distinction  between  essential  and  policy  constraints  appears  to  be  vague,  for 
a  particular  IS  one  could  recognize  the  essential  constraints  as 

those  assertions  which  would  be  used  as  lemmas  in  a  formal  proof  of 
correctness  and  termination  of  any  transaction  which  could  be  invoked  by  the 
user. 

The  proof  would  however  have  to  be  very  precise  and  include  checking  that  every 
primitive  operation  in  the  programming  language  executes  without  raising  an  unhandled 
exception  which  would  abort  it  (eg,  checking  for  overflow,  etc).  In  practice,  it  is  of 
course  impossible  to  check  mechanically  whether  some  constraint  is  essential  or  not,  so  we 
must  rely  on  the  system  designers  to  classify  constraints  appropriately 

In  order  to  increase  flexibility  in  handling  exceptions,  we  wish  to  allow  constraints  to  be 
violated  yet  continue  with  the  operation  which  raised  the  violation  Clearly  this  will  not  lead 
to  problems  for  policy  constraints  which  are  not  inherent.  On  the  other  hand,  there  are  a 
few  inherent  constraints  which  so  pervade  the  design  of  the  language  that  it  becomes 
practically  impossible  to  violate  them  without  inventing  an  entirely  new  language  We  could 
make  the  case  that  the  following  are  such  constraints  for  TAXIS: 

•  everything  is  an  object 

•  every  object  must  belong  to  some  class 

•  objects  can  be  related  to  each  other  only  through  properties,  which  are 
single-valued  functions 
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•  the  definition  of  a  class  specifies  the  properties  applicable  to  its  instances 
and  associates  a  class  with  it 

•  one  cannot  invoke  a  non-existent  operator,  or  one  with  an  inappropriate 
number  of  operands  (syntactic  condition) 

In  the  next  sections  we  will  discuss  some  of  the  problems  arising  when  we  consider 
allowing  violations  to  other  essential  constraints  to  persist 


4.1  Type  constraints  for  properties  in  Semantic  Models 

The  notion  of  type,  as  used  in  Programming  Languages,  allows  one  to  associate  a  name 
to  some  subset  of  the  universe  of  data  values  and  specify  a  group  of  pre-defined 

operators  which  can  involve  these  values 

In  Semantic  Models  classes  are  used  as  types,  with  entity  types  being  the  only  novel 

concept  and  with  the  conspicuous  absence  of  such  compound  types  as  arrays.  For  every 

variable  (including  properties  of  entities)  we  then  associate  a  type:  in  order  to  make  the 
discussion  simpler,  we  will  henceforth  view  variables  of  procedures  as  properties  of  the 
procedure  object  las  in  TAXIS)  and  hence  concern  ourselves  only  with  the  type  of  values 
for  properties  of  objects 

The  classes  of  a  semantic  model  can  be  usually  categorized  as  extensionally  defined,  if  its 
set  of  instances  is  defined  explicitly  by  enumeration  or  intensionally  defined,  if  the 

instances  are  selected  from  another  classes  through  a  predicate  or  boolean-valued 
procedure.  In  an  idealized  semantic  model,  classes  would  belong  to  exactly  one  of  the 
following  groups 

anyStr i ngC 1  ass 
anyBoo 1 eanC 1  ass 
any  I nteger C 1  ass 
anyRea 1 C 1  ass 
anyEnumer at i onC 1  ass 
anyRecordC lass 
anySetC 1  ass 
anySequenceC 1  ass 
anyEnt i tyC 1  ass 

Some  classes  such  as  Integer,  Real,  String  and  Boolean  have  their  extension  defined  by 
the  language  specification  or  implementation.  The  programmer  can  define  new 
EnumerationClasses  which  consist  of  the  standard  enumerations  introduced  in  Pascal. 

Additional  extensionally  defined  classes  can  be  obtained  by  labelled  Cartesian  product 
(ordinary  records)  as  in  ADDRESS(city:CITIES,  zipINT)),  and  set  and  sequence  creation  (eg., 
set  of  PERSON). 

Finally,  the  designer  can  specify  EntityClasses,  whose  extensions  are  explicitly  built  up 
using  addition  and  delition  at  run-time  (eg.,  CUSTOMER).  Instances  of  such  entity  classes 
can  have  properties. 

Intensionally  specified  classes  are  defined  using  the  syntax 
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C  -  D  such  that  P(this) 

where  P(this)  is  a  boolean-valued  expression  with  one  variable  this  ranging  over  D  (e  g. 

CHILD  :=  PERSON  such  that  (this.age  <  12) 

Very  frequently  we  need  to  specify  subranges  of  numbers  and  strings  These  could  be 
considered  to  be  intensionally  specified 

YEARS  :=  INTEGER  such  that  -5000<this  6  this<2000 
SALARY  :=  REAL  such  that  d i g i ts  (th i s) =8  & 

000000 . 00<=th i s  &  th i s<=200000 . 00 
NAMES  :  =  STRING  such  that  I  eng th  (th i s) =30 

but  could  also  be  considered  enumerated,  esp  if  we  provide  special  syntax  for  them 
YEARS  :=  { | -5000  ::  2000  |} 

SALARY  :=  { | 0 :: 200000 . 00 ,  d i g i ts  (6 , 2)  | } 

NAMES  :=  STR I NG  (30) 


4.2  The  problem  of  measurements 

In  order  to  maintain  the  ability  to  interpret  the  data  in  an  IS  correctly,  we  must  impose 
constraints  to  prevent  the  operations  which  yield  nonsensical  answers  It  would  seem 
particularly  important  in  ISs  to  prevent  operations  such  as 

adding  12  (US  dollars)  to  48  (French  Francs); 
adding  12  (miles)  to  48  (km/hr)  times  30  (gallons)  ; 
adding  12  (oranges)  and  14  (miles/hr); 

assigning  12  (ORANGES)  +  14  (APPLES)  to  variable  Quant  (of  type  APPLES) 
although  we  would  want  to  allow 

12  (miles)  +  48  (mi/hr)  *  2  (hrs)  ; 
assigning  12  (minutes)  to  a  variable  Duration  (  HOURS) 
and  getting  appropriate  conversion 

assigning  12  (oranges)  +  14  (apples)  to  variable  Quant  (counting  FRUITS) 
In  fact,  the  great  majority  of  values  stored  in  databases  are  counts  or  measurements  of 
various  kinds  (only  purely  mathematical  objects  have  no  associated  units)  and  it  would  seem 
important  to  be  able  to  represent  this  information  in  an  IS  Ideally,  this  would  mean  that 
every  quantity  or  count  is  in  fact  stored  as  a  2-tuple  with  its  magnitude  and  units  Other 
approaches  to  this  problem  include  a)  defining  isomorphic  but  incompatible  images  of  the 
integers  as  well  as  subtypes  (eg.,  ADA),  and  b)  defining  measurements  as  abstract  data 
types  (  [Hilfinger  81],  [Lucas  84] )  The  first  approach  fails  to  deal  with  the  problems  of 
conversion  and  arithmetic  operations  such  as  multiplication  (examples  e  and  f)  The  second 
approach  has  so  far  not  been  extended  to  deal  with  object  counting,  nor  to  the  syntactic 
use  of  default  units 


For  the  sake  of  expediency,  we  will  adopt  approach  (a)  above,  by  allowing  class 
definitions  of  the  form 


C  :=  like  D 

and  enforcing  the  rule  that  objects  of  type  C  cannot  be  mixed  with  other  types  in 
arithmetic  expressions,  except  for  constants.  Of  course,  we  will  have  to  allow  for 
procedures  which  convert  values  from  one  domain  to  another. 
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5  Violations  of  type  constraints 

In  this  section  we  provide  a  detailed  look  at  the  ways  in  which  type  constraints  could  be 
sensibly  violated  in  an  IS,  together  with  examples  of  such  problems  One  goal  is  to  make 
us  aware  of  the  problems  which  we  may  encounter  if  we  allow  such  violations  to  persist. 


5.1  On  null  values 

One  of  the  most  common  ways  in  which  type  rules  may  be  violated,  especially  in  an  IS, 
is  by  an  attribute  having  a  "null''  value  The  following  taxonomy  of  various  uses  of  null"  is 
related  to  that  presented  in  [ANSI  X3/SPARC  75]  but  is  explicitly  based  on  the  view  of 
the  database  as  our  potentially  incomplete  knowledge  of  the  world 

Consider  attribute  p  of  entity  b  in  the  real  world  (We  will  use  bA  to  refer  to  the  object 
representing  b  in  the  DB  and  assume  that  bAp  represents  our  knowledge  of  b.p) 

First  we  have  a  case  of  complete  knowledge  about  the  world  It  is  known  that  there  is 
no  object  which  is  related  to  b  by  property  p  (eg,  a  person  currently  has  no  spouse,  a 
man  has  no  maiden  name).  One  could  distinguish  the  subcase  where  this  value  could  never 
exist. 

The  remaining  cases  represent  incomplete  world  knowledge 

It  could  be  the  case  that  the  object  bp  is  known  to  exist  in  the  world,  but 

Noinformation:  we  have  no  information  about  it;  [ANSI/X3/SPARC  75]  distinguishes  the 

subcases  where  the  information  can  never  be  acquired  because  of 
policy  or  historical  accident; 

Approximate:  we  only  have  an  approximation  of  it,  possibly  including  a  set  of  values  to 

which  it  belongs  (eg,  error  bounds), 

NonRepresentable  we  cannot  represent  the  value  in  a  way  which  can  be  manipulated  by 

this  computer  (eg.  overflow,  underflow,  string  too  long) 

Finally,  we  may  have  no  knowledge  about  the  existence  of  the  object  at  this  moment, 
and 

Noinformation  So  far,  no  information  has  been  put  into  the  knowledge  base  about  bp.  This 

corresponds  to  an  uninitialized  variable  bA.p  in  a  standard  programming 
language  An  object  related  to  b  by  p  may  or  may  not  exist  in  the 
world,  (viz.  ml  value  in  [Zaniolo  84]). 

IIIDefined.  Although  an  attempt  has  been  made  to  store  information  about  b.p,  the  value  was 

a  result  of  a  computation  which  did  not  complete  successfully,  (e.g, 
bA.p  ;=  y / 0 ; )  It  is  not  clear  that  Ill-defined  nulls  are  different  from 
Nolnformartion 

OutDated  The  b.p  may  have  changed  or  not  even  exist  at  this  point  We  could  distinguish 

the  case  when  we  know  for  sure  that  bA.p  does  not  represent  b.p  . 

It  is  very  likely  that  the  above  list  is  incomplete  and  finer  distinctions  can  be  made,  and  we 
will  in  fact  keep  this  issue  in  mind  when  designing  the  language  mechanism 

Also,  null  values  (esp.  invalid  information)  will  be  crucial  in  any  system  which  allows 
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operations  to  continue  even  after  a  violation  has  been  raised  They  can  be  used  to  extend 
the  domain  of  most  primitive  operators  so  that  they  become  total  (eg.  division  by  0 
returns  null),  and  then  propagate  null  in  further  operations  possibly  using  an  algebra  of  null 
along  the  lines  suggested  in  [Wetherell  82], 

Any  attempt  to  perform  operations  on  nulls  other  than  checking  for  identity  should 
normally  raise  an  exception  since  the  result  would  not  model  reality  appropriately.  Of 
course,  one  should  be  able  to  operate  on  the  various  approximate  values  once  they  are 
explicitly  acknowledged  to  be  exceptional 


5.2  Violations  of  numeric  type 

The  important  property  of  numeric  values  is  that  one  can  perform  arithemtic  operations 
on  them  The  following  examples  point  out  instances  where  type  rules  could  reasonably  be 
breached 

Non-numeric  value  for  numeric  magnitudes 

eg.,  a  qualitative  answers  for  ordinarily  quantitative  questions,  such  as 
Temperature  =  normal'  in  a  patients  medical  chart.  Such  examples 
seem  to  belong  under  null  Estimates  since  operations  on  them  should 
normally  not  be  allowed  If  numbers  are  used  for  reasons  other  than 
magnitude,  eg  zip  codes,  exceptions  may  arise  in  cases  where 
alphanumerics  occur  eg.,  Canadian  zip  codes  have  letters.  However, 
this  is  presumably  a  design  mistake  since  zip  codes  should  be  images 
in  any  case 

Numeric,  but  out  of  normal  range 

a  typical  problem  in  reality  (people  do  earn  high  salaries,  get  very  old, 
negative  payment  rates,  etc).  Operations  appear  to  be  performable  in 
the  same  manner  as  normal  range  though  the  semantics  of  these 
become  questionable  when  normally  positive  values  are  in  fact 
negative  (eg,  what  does  a  negative  electric  bill  mean  ?)  in  which  case 
more  circumspection,  as  with  estimates,  is  required. 

Numeric,  but  of  different  accuracy 

eg,  a  different  instrument  is  used  to  measure  some  value,  somewhat 
like  an  estimate  this  however  could  be  more  accurate  --  eg.,  have  a 
decimal  portion,  vs  ordinarily  an  integer,  values  could  probably  used  in 
operations  normally: 

Numeric,  but  with  a  different  unit. 

eg.  an  investment  reported  in  foreign  currency;  note  that  conversion 
to  same  unit  is  not  always  possible,  as  in  this  case  where  exchange 
rates  fluctuate, 


5.3  Other  type  violations 
Multiple  valued  attribute 

eg.,  someone  with  more  than  one  manager,  or  in  more  than  one 
department.  This  problem  is  quite  ubiquitous,  and  the  values  could  be 
stored  as  sets,  but  there  are  grave  difficulties  in  evaluating 
expressions  which  assume  single  values  (PRINT(x.dept.name)). 
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Not  in  an  enumerated  list 

eg.  the  rank  of  Faculty  is  normally  {aP,AP,P}  but  someone  is  hired  as 
a  Lecturer  or  Instructor  One  problem  raised  by  such  violations  is  that 
PASCAL-like  comparators  are  not  defined  for  these  values 

Not  in  a  Cartesian-product  class 

eg.,  an  address  with  an  additional  attribute,  such  as  county  (required  in 
many  countries)  This  is  similar  to  the  general  problem  of  wishing  to 
have  additional  attributes  for  entities  in  some  class  (missing  attributes 
can  have  null  values)  and  clearly  requires  tools  to  look  at  attributes 
which  are  not  known  to  exist  at  compile  time 

Not  in  the  entity  class  specified 

eg.,  an  advisor  who  is  not  a  Professor  just  an  Administrator  All 
operations  on  the  exceptional  value,  other  than  comparison  for 
identity,  become  problematic,  especially  attribute  evaluation;  if  the 
entity  belongs  to  some  common  superclass,  fewer  problems  can  be 
expected  As  mentioned  above,  another  exception  of  this  general  kind 
occurs  when  we  wish  to  qualify  an  object  with  an  attribute  which  is 
not  sanctioned  by  the  classes  of  which  it  is  a  member  (eg.  an 
employee  who  has  special  skills ).  traditionally,  "comment  fields"  in 
record-based  systems  have  been  used  for  this  purpose 

Set  or  sequence  value  with  values  from  an  enlarged  universe 

e  g ,  a  student  has  an  attribute  was-tauqht-by,  which  is  supposed  to 
be  in  the  range  set  of  PROFESSOR,  yet  one  of  his/her  teachers  was 
in  fact  someone  from  industry,  who  was  not  a  professor.  In  addition 
to  the  problems  mentioned  above  for  "not  in  the  object  class 
specified  ,  implementation  problems  may  arise  with  Pascal-like  sets, 
which  would  lead  to  problems  in  performing  set  operations 

It  should  be  noted  that  all  values  which  cause  type  errors  could  lead  to  problems  with 
conditional  control  structures  such  as  ELSE  clauses,  since  normal  application  programs 
would  not  anticipate  them. 


5.4  The  roles  of  type  constraints 

From  the  point  of  view  of  modeling  enterprises,  the  primary  purpose  of  type  constraints 
is  to  maintain  the  validity  of  the  information  by  detecting  incorrect  data  entry  and 
inappropriate  application  of  operators  (eg,  arithmetic  cannot  be  performed  on  strings). 

In  compiled  languages,  type  information  allows  the  mechanical  detection  of  certain 
programmi ng  errors  before  program  execution  In  particular,  one  can  flag  operations 
which  are  known  to  be  applied  outside  their  domain,  by  using  knowledge  about  the  types 
of  the  arguments  and  the  disjointness  of  certain  types.  This  is  particularly  important  for  the 
development  of  reliable  software  and  is  the  cornerstone  principle  of  "strongly  typed" 
languages. 

In  languages  with  type  hierarchies,  type  information  can  be  used  to  optimize  the  code 
generated  at  compile  time,  by  eliminating  redundant  checks  For  example,  if  TYPE(x)  IS-A 
TYPE(y)  IS-A  INTEGERS  then  in  the  statement  y:=x,  the  assignment  operator  need  not  check 
that  the  type  constraint  for  y  would  be  violated,  and  in  the  expression  (x  +  2),  the  addition 
operation  need  not  check  that  its  operands  are  integers. 
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Type  information  is  useful  in  deciding  how  much  memory  to  allocate  to  each  variable  at 
compile  time  (eg,  information  about  the  length  of  strings,  magnitude  of  integers,  etc). 

The  knowledge  that  certain  types  are  disjoint  can  sometimes  be  exploited  to  achieve 
more  efficient  storage  For  example,  by  making  numbers  and  strings  disjoint  types  we 
can  use  the  same  bit  pattern  to  represent  two  distinct  objects  (hence  requiring  fewer  bits 
to  represent  all  possible  values)  The  distinction  between  the  two  objects  can  be 
recaptured  using  the  type  information  of  course 


5.5  Implementation  problems  when  allowing  violations  of  type  constraints 

Assuming  that  we  have  found  an  exception  to  some  integrity  constraint  expressed  by  a 
type  rule  for  property  p  of  entity  b,  we  would  encounter  at  least  the  following  costs  if 
we  allowed  the  constraint  to  be  violated: 

1  Exceptional  objects  would  have  to  carry  along  at  run  time  type  information 

2.  In  the  cases  where  we  cannot  infer  that  a  particular  attribute  is  not 
exceptional,  we  have  to  generate  code  for  run-time  type  checking  This  could, 
but  need  not,  be  especially  expensive  in  loops,  it). 

3  More  complex  storage  structures  and  increased  access  time  would  be 
required  for  exceptional  values  since  they  need  to  be  allocated  storage  space 
in  in  other  places  than  normally 


6  Violations  of  First  Order  Semantic  constraints 

The  data  in  a  IS  described  in  Semantic  Model  has  a  particularly  simple  interpretation  in 
logical  terms  The  objects  form  a  domain  over  which  functions  and  predicates  are  defined 
All  information  can  in  fact  be  represented  through  3  simple  predicates: 

I s I nstanceOf  (obj ,  class) 

HasDoma in  (class, property, class) 

I  sA  (c  1  ass  ,  class) 

and  a  group  of  functions,  one  for  each  attribute 

A  semantic  condition  is  then  violated  whenever  these  functions  and  predicates  are  defined 
in  such  a  way  that  the  assertion  has  false  as  truth-value  When  we  wish  to  allow  a 
violation  of  a  formula  to  persist,  we  are  inclined  to  say  that  the  failure  of  the  assertion  is 
due  to  some  fact  being  "exceptional"  Based  on  the  previous  view,  it  is  clear  that  what  is 
exceptional  is  NOT  a  particular  value  or  object  (eg  the  number  50,000),  nor  a  particular 
attribute  (eg,  salary),  but  the  value  of  a  function  at  an  argument  or  the  truth-value  value 
of  a  predicate  at  some  argument.  We  will  use  the  notation  "p@b  is  exceptional"  to  indicate 
that  p(b)  as  a  function  or  predicate  has  been  associated  with  the  failure  of  some  assertion. 

When  an  assertion  fails,  it  may  not  be  obvious  why  it  has  done  so  It  is  important  to 
point  out  that 


These  could  be  replaced  by  one  predicate  HasValuelobj,  property,  value). 
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•  a  violation  of  some  assertion  need  not  be  the  result  of  a  single  exceptional 
value  For  example  if  for  some  employee  e,  (e. salary  <  e. manager. salary)  fails 
to  hold,  then  this  may  be  due  to  one  or  more  of  e. salary,  e  manager  and 
e. manager. salary  being  exceptional. 

•  a  particular  update  may  bring  to  light  a  previous  error,  so  the  update 
detecting  the  error  need  not  be  the  exceptional  case 

•  it  is  not  always  possible  to  deduce  from  the  assertion  alone  what  is  "causing' 
the  formula  to  fail  (e.g.,  formulas  with  existential  quantifiers  and  disjunction) 

It  is  therefore  crucial  to  have  the  intervention  of  an  outside  agent  to  blame  particular 
facts  for  a  violation,  and  this  is  a  sensible  approach  in  the  case  of  ISs 

There  are  several  issues  surrounding  the  presence  of  exceptional  values  in  an  IS 

One  concerns  the  use  of  such  values  in  updates  If  p®  b  is  exceptional  then  an  update  of 
the  form  y.q  :=  b.p  may  in  itself  raise  new  exceptions  --  this  time  related  to  the  values 
of  q  --  if  ps.’b  is  a  type  error  However,  this  need  not  always  be  the  case,  as  with 

policy  errors  or  the  case  when  a  variable  x  of  type  0  1 0  has  value  12  and  is  assigned  to 

the  variable  y  of  type  0..20  For  example,  if  joe.supervisor=joe  then  supervisor®  joe  is 
exceptional  for  the  rule  (pop  aXXx)x  supervisor~=x  Should  we  allow 

bill. supervisor  joe  .supervisor?  What  if  the  original  exception  was  only  a  temporary 

aberration,  which  was  known  to  be  eventually  corrected?  Should  the  correction  been 
propagated,  and  if  so  how? 

A  second  issue  concerns  the  view  of  assertions  as  integrity  constraints  By  allowing  an 
exception,  we  ordinarily  leave  the  assertion  as  false  But  this  makes  it  useless  as  a  guard 
against  future  data-entry  errors  since  it  will  continue  to  fail  due  to  the  earlier  exceptional 
fact  For  this  reason,  we  will  have  to  consider  ways  of  modifying  assertions  and/or 
integrity  checking  so  that  new  errors  can  be  detected  even  after  a  violation  has  been 
allowed  to  stand 


7  Software  Engineering  with  exceptions 

An  important  point  concerning  Software  Engineering  methodology  is  the  manner  in  which 
the  handling  of  abnormal  or  atypical  situations  is  incorporated  in  the  overall  systems  design 
It  has  been  convincingly  argued  that  an  important  ingredient  in  managing  the  complexity  of 
software  development  is  the  notion  of  abstraction  the  ability  to  ignore  inessential  details 
This  is  of  course  coupled  with  a  mechanism  which  allows  for  the  details,  which  after  all 
are  necessary,  to  be  introduced  in  a  controlled  manner  One  can  view  programming  by 
stepwise  refinement  and  abstract  data  types  as  instances  of  this  pattern.  It  is  our  claim  that 
the  ability  to  ignore  abnormal  or  atypical  occurrences  in  the  design  of  software  is  also  an 
important  abstraction  mechanism:  in  writing  or  reading  programs  it  is  very  distracting  to 
constantly  take  detours  from  the  normal  flow  of  events  to  find  out  what  is  to  be  done  is 
rare,  special  circumstances.  We  therefore  propose  to  continue  our  research  on  a 
methodology  of  IS  software  construction,  by  considering  techniques  that  allow  one  to 
consider  only  normal  situations  in  the  first  pass  through  a  design,  and  then  in  successive 
passes  account  for  expected  abnormalities.  Ideally,  the  code  produced  by  such  a  design 
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would  itself  reflect  this  separation  of  concerns  for  normal  and  abnormal  cases,  and  aid  in 
maintenance 

In  addition,  we  are  interested  in  a  methodology  which  leads  the  designer  to  anticipate  and 
consider  the  great  majority  of  error  conditions  that  could  occur  —  an  exceedingly 
important  task  in  an  interactive  information  system  The  germinal  forms  of  such  a 

technique  are  already  available  in  TAXIS,  where  one  can  specify  transactions  with 
preconditions,  post  conditions,  etc  first,  and  then  in  a  second  pass  describe  the  exceptions 
raised  by  each  precondition,  while  handlers  are  attached  to  exceptions  in  a  final  pass  (see 
[Borgida  et  al  82]) 


8  Desiderata  for  a  mechanism  of  condoning  exceptions 

Based  on  the  preceding  discussions  we  can  summarize  the  following  properties  of  the 
exception  handling  mechanism  which  we  are  looking  for 

In  order  to  manage  exceptional,  but  acceptable,  situations  we  must  have  mechanisms  for 

1  Creating  and  storing  persistent  exceptional  occurrences 

This  should  allow  us  to  specify  that  certain  facts  are  exceptional  but  still  must 
be  stored. 

2.  Controlling  the  existence  of  exceptional  occurrences. 

Not  everyone  should  have  the  ability  to  contradict  policies,  so  some  form  of 
protection  is  needed. 

3.  Accessing  and  discriminating  between  exceptional  and  unexceptional 
occurrences 

In  particular,  the  mechanism  must: 

•  given  some  assertions,  allow  access  to  occurrences  exceptional  with 
respect  to  that  assertion,  in  order  to  perform  supervisory  or  managerial 
operations, 

•  given  some  operation  to  perform  (esp  primitive  operations  such  as 
comparators,  arithmetic  ops,  attribute  evaluation),  be  able  to  filter  out 
from  its  domain  objects  which  would  be  outside  its  domain  because  of 
exceptions; 

More  specifically,  we  appear  to  need  the  ability  to 

•  check  if  a  particular  ps'b  is  exceptional,  as  the  cause  of  some  violation; 

•  if  so,  obtain  the  assertion  whose  violation  makes  that  fact  exceptional; 

•  determine  whether  one  of  the  (primitive)  operations  would  have  one  of  its 
essential  conditions  violated  if  invoked  with  some  exceptional  information; 
(without  actually  invoking  the  operation,  of  course) 

•  examine  all  the  attributes  of  an  object 

•  examine  all  the  exceptional  attributes  of  an  object 

•  examine  all  instances  of  a  class 

•  examine  all  instances  of  a  class  which  have  no  exceptional  attributes 
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•  examine  all  instances  of  a  class  which  do  not  form  part  of  an  exceptional 
fact  that  violates  some  specific  assertion 

•  when  an  exceptional  fact  is  updated,  review  the  violation  and  possibly  cancel 
it 

For  policy  reversals,  including  dynamic  exceptions  detected  by  conditions  associated  with 
transactions,  we  need  not  worry  about  information  left  behind  which  could  cause  others  to 
have  problems.  However,  for  administrative  control,  we  must  assign  to  someone 
responsibility  for  allowing  the  transgression  to  occur,  and  this  information  must  be  kept 
around  indefinitely  for  the  purpose  of  auditing  In  general,  administrators  may  wish  to 
access  all  exceptional  invocations  of  a  certain  operator/transaction,  or  more  selectively, 
those  that  resulted  in  the  failure  of  some  specific  precondition 

In  line  with  our  goal  of  supporting  the  abstraction  of  dealing  with  abnormal  cases  in  a 
separate  pass,  the  entire  exception  handling  mechanism  both  for  the  case  when  the  original 
violation  is  detected,  and  when  exceptional  information  is  accessed,  should  be  writable  as 
annotations  to  a  program  which  deals  with  normal  cases  only. 

r 

From  the  point  of  view  of  implementation,  we  find  the  following  desirable: 

1.  the  presence  of  exceptions  should  not  degrade  the  efficiency  of  dealing  with 
normal  information,  assuming  that  by  definition  exceptions  will  be  rare 
occurrences; 

2  the  language  should  continue  to  detect  violations  of  constraints  even  after 
exceptions  have  been  introduced 

More  specifically,  we  will  have  to  deal  with  the  problem  of  compiling  code  for  cases  when 
we  cannot  be  sure  that  type  constraints  are  met,  when  key  values  may  not  be  unique,  and 
we  will  have  to  be  especially  vigilant  with  the  increased  cost  of  runtime  checks  inside 
loops,  where  they  can  multiply  significantly 

9  Other  issues  related  to  exceptional  occurrences  in  ISs 


9.1  Long-term  activities  and  communication  with  users 

An  important  part  of  the  description  of  an  IS  are  the  long-term  activities  in  the  world 
and  the  model  of  user  dialogues  with  the  system  For  this  purpose,  the  notion  of 

Augmented  Petri  Nets  (here  scripts)  has  been  gaming  acceptance.  Scripts  (  [Zisman  77], 

[Barron  80])  are  essentially  parameterized  Petri-nets  with  states  connected  by  transitions, 
which  in  our  case  have  the  form  of  condition  =>  action  pairs,  and  where  the  actions  allow 

database  updates  and  communication  acts  with  users  or  other  scripts  The  following  is  an 

example  describing  the  billing  process  for  our  hypothetical  utility  company: 
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parameters 

c.  CUSTOMER 


locals 

m  METER 


Receive  termination  request? 
call  CLOSE _ ACCOUNT(c./r>); 


Figure  3:  Otuiine  of  FOLLOW _ CUSTOMER  script 


Paradoxically,  the  above  script  contains  both  too  little  and  too  much  information  about 
exceptional  occurrences.  The  problem  of  delayed  payments  is  presumably  not  the  norm  in 
the  real  world  so  its  description  should  not  "clutter  up"  the  description  of  the  normal 
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cases,  but  instead  should  be  specified  separately  as  an  aside  in  the  manner  of  exception 
handlers  On  the  other  hand,  numerous  other  exceptional/rare  occurrences,  such  as  pre¬ 
payment  of  bills,  etc.  have  been  omitted 

The  first  problem  in  this  area  is  to  define  exactly  what  is  meant  by  an  exceptional 
occurrence  in  a  script  For  data  and  activity  classes  we  used  the  notion  of  assertion  failure, 
so  it  would  make  sense  to  associate  assertions  with  the  states  of  a  script  and  require  that 

g 

they  be  true  whenever  a  state  is  active  .  As  a  starting  point,  we  observe  that  scripts  are 
distinguished  from  transactions  by  the  inclusion  of  information  on  two  special  subjects: 

•  causal/temporal  information  encoded  in  the  Petri  net  and  some  of  the 
conditions  on  transitions 

•  communication  patterns,  encoded  in  conditions  and  actions  in  the  form  of 
message  passing 

We  believe  that  exception  handling  mechanisms  for  scripts  will  eventually  need  to  exploit 
the  semantics  of  these  real-world  occurrences,  even  if  in  a  restricted  form  For  this,  we 
should  produce  among  others  a  categorization  of  the  types  of  information  appearing  in  a 
script  which  associates  with  every  category  a  list  of  specific  exceptional  situations  which 
the  IS  designer  must  provide  for  in  the  form  of  exception  handlers  As  a  simple  example 
if  one  of  the  conditions  on  an  arc  is  of  type  temporal  deadline  (eg,  "30  days  later”)  then 
a  potential  exception  is  that  the  deadline  is  missed 

Furthermore,  it  is  probably  impossible  to  anticipate  all  possible  action  sequences  that 
could  take  place,  in  an  automated  office  for  example,  hence  controlled  means  for  adding 
or  performing  transitions  "on-line”  form  another  subject  for  study. 


9.2  Static  Exceptions 

The  types  of  abnormalities  mentioned  above  could  be  classified  as  dynamic  in  the  sense 
that  they  arise  during  the  operation  of  IS  and  are  due  to  individual  differences  from  the 
norms  set  up  in  the  IS  schema  A  second,  static,  group  of  abnormalities  may  be  created 
during  the  design  of  IS  itself. 

As  we  mentioned  at  the  beginning,  our  ideal  is  to  view  IS  design  as  world  modeling  and 
description.  Contradictions  may  arise  during  this  process  as  a  result  of  the  well  know 
human  limitations  on  dealing  with  multitudes  of  details  In  particular,  when  an  object  may  be 
an  instance  of  several  classes,  it  may  be  impossible  for  that  object  to  meet  the  constraints 
imposed  by  all  of  them  because  they  are  contradictory.  This  problem  arises  especially 

when  classes  are  organized  in  a  general ization  hierarchy7  Ideally,  all  instances  of  a  class 
are  instances  of  all  its  superclasses,  and  hence  all  constraints  stated  for  a  class  must  hold 
for  every  one  of  its  subclasses,  although  such  a  constraint  may  be  strengthened  For 


This  was  suggested  bv  John  Mylopoulos  in  private  conversation. 

See  [Borgida  et  al.  82]  tor  a  recent  discussion  of  the  importance  of  generalization  as  an  abstraction 
principle  in  IS  design. 
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example,  in  a  top  down  description  of  an  IS  for  a  hospital,  it  is  natural  that  the  designer 
might  include  in  the  description  of  PATIENT  class  that  each  instance  must  have  a  blood 

pressure  taken  upon  admission  This  will  be  appropriate  for  the  vast  number  of  subclasses 

of  PATIENT,  such  as  SURGICAL,  PEDIATRIC,  etc.  which  will  be  described  later  in 

successive  passes;  in  this  sense  / nher itance  —  the  ability  to  have  such  a  constraint 

automatically  imposed  on  all  subclasses  of  PATIENT  without  it  having  to  be  explicitly 

restated  is  a  very  important  abbreviatory  device  which  saves  time  and  space,  as  well  as 
preventing  simple  errors  due  to  carelessness  In  fact  we  plan  to  use  inheritance  hierarchies 
to  also  organize  exceptions,  handlers  and  excuses,  as  well  as  provide  for  defaults. 
However,  humans  often  over-abstract  in  order  to  deal  with  the  complexities  of  nature 
Thus,  a  few  special  subclasses,  such  as  PSYCHIATRIC  and  THALIDOMIDE  do  not  have  this 

constraint  imposed  If  this  fact  is  correctly  recorded  in  the  model  being  built,  yet  the 

instances  of  these  two  classes  are  still  considered  to  be  instances  of  PATIENT,  which  is 
very  likely  the  way  users  in  the  hospital  view  it,  there  will  be  a  contradiction  We  wish  to 
extend  our  research  area  to  include  the  problems  raised  by  such  static  exceptions 

occurring  during  the  IS  design. 

Among  the  alternatives  to  be  considered  we  list 

•  Warning  the  designer  of  the  contradiction  but  using  some  form  of  default 
rule'  which  selects  only  one  of  the  rules  to  be  applied  to  any  object.  The 
main  problem  with  this  approach  is  the  non-locality  caused  by  the  need  to 
know  all  intervening  classes  when  inheritance  operates,  as  well  as  the  related 
implementation  issues  (eg,  if  PERSONS  can  be  uniquely  identified  by  their  SS# 
but  CHILDREN,  who  are  PERSONS,  cannot,  how  should  one  store  these  classes 
of  objects  in  a  record-based  DBMS,  such  as  a  relational  system.) 

•  Splitting  the  schema  automatically  to  avoid  clashes  (eg,  create  PERSON  1  and 
PERS0N2  subclasses  of  PERSON  and  attach  SS#  constraint  to  PERSON!  while 
making  CHILDREN  IS-A  PERSON2);  the  problems  with  this  approach  include 
combinatorial  explosion  and  forcing  the  designer  and  eventual  users  to  use 
possibly  un-natural  classes. 

•  Providing  some  similarity  mechanism,  such  as  that  in  [Lesperance  80],  which 
allows  the  definition  of  classes  by  noting  differences  —  essentially  inheritance 
without  placement  in  an  IS-A  hierarchy 

The  final  proposal  must  be  a  compromise  between  facilitating  the  task  of  the  designer  and 
conceptual  integrity  of  the  resulting  schema. 


10  Summary 

We  feel  that  very  useful  leverage  can  be  achieved  in  research  by  restricting  attention  to 
a  subclass  of  all  possible  computer  systems  --  one  with  some  clear  commonalities  and 
wide  practical  import.  Information  Systems  form  a  prime  example  of  such  a  subclass. 

We  have  argued  in  favor  of  the  following  propositions  in  the  context  of  Information 
Systems: 

•  The  large  number  of  possible  special  cases  makes  it  imperative  that  the  design 
of  ISs  proceed  following  some  abstraction  principle  which  postpones  dealing 
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with  them  to  a  later  stage;  at  the  same  time  the  methodology  should  help  the 
designer  to  remember  to  deal  with  the  abnormal  cases. 

•  Although  ISs  need  to  maintain  a  strong  control  over  the  quality  of  data  stored 
in  them,  this  must  co-exist  with  the  possibility  of  deciding  at  run-time  that 
some  fact  is  accurate  yet  violates  a  constraint  built  into  the  IS 

•  A  complete  exception  handling  mechanism  for  ISs  must  then  provide  for  the 
ability  to  continue  operations  even  after  a  violation  of  some  constraint  has 
been  detected,  although  this  must  be  subject  to  control,  and  leave  audit  trails. 

•  However,  enough  information  should  be  entered  to  allow  other  users  of 
shared  information  to  avoid  tripping  over  exceptional  information  in  the 
databases,  especially  in  the  case  of  type  constraint  violations 

We  have  paid  particular  attention  to  the  problem  of  type  constraints  and  their  violations, 
since  these  are  the  most  difficult  kinds  of  constraints  to  suspend 

In  a  forthcoming  report  we  will  present  the  actual  exception  handling  mechanism  that  has 
been  designed  based  on  the  issues  considered  in  this  paper,  and  in  a  separate  report  we 
will  present  a  logical  reformulation  of  the  problem  of  exceptions 
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